{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "e28cb3de",
"metadata": {},
"outputs": [],
"source": [
"# 43679 -- Interactive Visualization\n",
"# 2025 - 2026\n",
"# 2nd semester\n",
"# Lab 1 - EDA (guided)\n",
"# ver 1.2\n",
"# 24022026 - Cosmetics; added rationale for task in scope of course"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lab 02
Task 2: Guided EDA and Data Cleaning\n",
"\n",
"The purpose of this task you to introduce you to the basic steps of performing data preparation for a dataset with several illustrative quality issues. In most situations you already have the basic code to be run; in others, you need to infer from existing code to complete the step. What is important here is for you to be able to identify the issues, understand the tools and approaches that may help tackling them, and acquire a systematic way of thinking about data preparation.\n",
"\n",
"**Don't just run the code. Understand why it is needed and what it is doing**\n",
"\n",
"**NOTE**: For those cells asking questions or with tables that can be filled, you can just double-click the cell and edit it with your answers and rationale\n",
"\n",
"**Dataset:** `dataset_A_indie_game_telemetry.csv`\n",
"\n",
"---\n",
"\n",
"### Objectives\n",
"\n",
"By the end of this task you will be able to:\n",
"- Use **SweetViz** to rapidly profile a dataset and identify issues\n",
"- Use **D-Tale** to navigate and inspect a dataframe interactively\n",
"- Use **pandas** to fix the most common categories of data quality problems\n",
"- Make and justify cleaning decisions rather than applying fixes mechanically\n",
"\n",
"### Tools and their roles in this task\n",
"\n",
"| Tool | Role |\n",
"|---|---|\n",
"| **SweetViz** | Automated profiling: generate a report, triage what needs fixing |\n",
"| **D-Tale** | Interactive navigation: browse rows, inspect value counts, confirm fixes visually |\n",
"| **pandas** | All actual cleaning: every transformation is explicit, reproducible code |\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1 — Setup and First Look"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import sweetviz as sv\n",
"import dtale\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"import pygwalker as pyg"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape: (10000, 20)\n"
]
},
{
"data": {
"text/html": [
"
| \n", " | session_id | \n", "user_id | \n", "start_time | \n", "end_time | \n", "session_length_s | \n", "region | \n", "platform | \n", "gpu_model | \n", "avg_fps | \n", "ping_ms | \n", "map_name | \n", "crash_flag | \n", "purchase_amount | \n", "party_size | \n", "input_method | \n", "build_version | \n", "is_featured_event | \n", "device_temp_c | \n", "session_type | \n", "is_long_session | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "sess_c2fba8e7f37a | \n", "user_488 | \n", "2025-07-18T18:32:00Z | \n", "2025-07-18 20:03:21-05:00 | \n", "5481.0 | \n", "us-west | \n", "pc | \n", "GTX1080 | \n", "83.52 | \n", "431.16 | \n", "ocean | \n", "Yes | \n", "0,00 | \n", "2 | \n", "Touch | \n", "NaN | \n", "No | \n", "85.6 | \n", "ranked | \n", "True | \n", "
| 1 | \n", "sess_33d286298cf9 | \n", "user_1511 | \n", "2025-06-13 23:21:08+00:00 | \n", "2025-06-13 23:36:30+01:00 | \n", "922.0 | \n", "Us-east | \n", "PlayStation | \n", "NaN | \n", "72.75 | \n", "29.12 | \n", "desert | \n", "No | \n", "0.0 | \n", "3 | \n", "Touch | \n", "NaN | \n", "0 | \n", "62.0 | \n", "casual | \n", "0 | \n", "
| 2 | \n", "sess_be2bb4d8986a | \n", "user_830 | \n", "2025-10-20 02:42:07-05:00 | \n", "20/10/2025 02:49 | \n", "451.0 | \n", "sa-east-1 | \n", "PlayStation | \n", "NaN | \n", "69.20 | \n", "40.47 | \n", "Forest | \n", "False | \n", "0.0 | \n", "5 | \n", "TOUCH | \n", "1.4 | \n", "False | \n", "69.0 | \n", "ranked | \n", "False | \n", "
| 3 | \n", "sess_7f425ca9a0e2 | \n", "user_1 | \n", "08/01/2025 06:35 | \n", "2025-08-01T08:32:45Z | \n", "7031.0 | \n", "sa-east-1 | \n", "PlayStation | \n", "NaN | \n", "33.29 | \n", "92.40 | \n", "Desert | \n", "No | \n", "17.55 | \n", "1 | \n", "Controller | \n", "1.3.2 | \n", "0 | \n", "48.1 | \n", "casual | \n", "True | \n", "
| 4 | \n", "sess_5657e28b22ec | \n", "user_211 | \n", "2025-09-08T23:41:44Z | \n", "2025-09-09 00:32:59+01:00 | \n", "3075.0 | \n", "US-EAST | \n", "switch | \n", "NaN | \n", "69.96 | \n", "12.63 | \n", "Desert | \n", "False | \n", "0.0 | \n", "2 | \n", "controllr | \n", "NaN | \n", "0 | \n", "54.7 | \n", "casual | \n", "Yes | \n", "