Files
VI_Lab_01_EDA/deploy/README.md
thepurpose d689ada45e Add deploy assets and update telemetry datasets
Prepare deployment package and clean telemetry/lab data: add deploy/ (README, datasaurus.csv, datasets and lab01 notebooks), add new lab02 dataset notebook variants (lab02_task1_datasets_v2/ v2b) and solutions for task3, and update multiple lab02 telemetry and git-activity notebooks. Clean and normalize claude/dataset_A_indie_game_telemetry_clean.csv (fill/standardize timestamps, session lengths and other fields) to improve consistency for downstream analysis.
2026-02-24 10:07:31 +00:00

129 lines
3.9 KiB
Markdown

# Lab 02 — Environment Setup
This document explains how to set up your Python environment and install all required packages before the lab session.
---
## Requirements
- **Python 3.10 or higher** (3.11 recommended or live wildly and go for the latest one. I have not tested it...)
- **pip** (comes bundled with Python)
- A code editor with Jupyter notebook support — [VS Code](https://code.visualstudio.com/) with the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) is recommended
---
## Step 1: Create a virtual environment
It is strongly recommended to work inside a virtual environment to avoid conflicts with other Python projects on your machine.
Open a terminal in the folder where you will work and run:
```bash
# Create the environment (only needed once)
python -m venv .venv
```
Then activate it:
```bash
# On Windows
.venv\Scripts\activate
# On macOS / Linux
source .venv/bin/activate
```
You should see `(.venv)` appear at the start of your terminal prompt. **You need to activate the environment every time you open a new terminal.**
---
## Step 2: Install required packages
With the environment active, run the following commands:
```bash
# Core data libraries
pip install "numpy<2.0"
pip install pandas matplotlib seaborn
# Automated EDA and profiling
pip install sweetviz
# Interactive dataframe explorer
pip install dtale
# Jupyter notebook support
pip install notebook ipykernel
```
> **Why `numpy<2.0`?** Several packages (including dtale and sweetviz) are not yet fully compatible with NumPy 2.x. Pinning to a 1.x version avoids runtime errors that can be difficult to diagnose.
Alternatively, you can install everything in a single command:
```bash
pip install "numpy<2.0" pandas matplotlib seaborn sweetviz dtale notebook ipykernel
```
---
## Step 3: Verify the installation
Run the following in a terminal (with the environment active) to confirm everything is working:
```bash
python -c "
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sweetviz as sv
import dtale
import numpy as np
print('numpy :', np.__version__)
print('pandas :', pd.__version__)
print('seaborn :', sns.__version__)
print('sweetviz: OK')
print('dtale : OK')
print('All packages installed successfully.')
"
```
---
## Step 4: D-Tale in VS Code (Windows)
D-Tale opens in a browser tab via a local server. On Windows, VS Code may not automatically forward the port if D-Tale binds to a network adapter other than the loopback address. All lab notebooks already include the correct launch code:
```python
d = dtale.show(df, host='127.0.0.1', subprocess=False, open_browser=False)
print('Open D-Tale at:', d._url)
```
If the URL does not open automatically, copy it from the output and paste it into your browser. If the page does not load, check the **Ports** panel at the bottom of VS Code and confirm port `40000` is being forwarded.
---
## Files for this lab
| File | Description |
|---|---|
| `lab01_task1_datasets.ipynb` | Task 1 — Datasaurus Dozen: why visualisation is essential |
| `lab01_task2_telemetry.ipynb` | Task 2 — Guided EDA and cleaning of game telemetry data |
| `lab01_task3_git_activity.ipynb` | Task 3 — Independent EDA and cleaning of Git classroom activity data |
| `datasaurus.csv` | Dataset for Task 1 |
| `dataset_A_indie_game_telemetry.csv` | Dataset for Task 2 |
| `dataset_D_git_classroom_activity.csv` | Dataset for Task 3 |
---
## Troubleshooting
**`ModuleNotFoundError` when running a notebook**
The notebook is using a different Python kernel, not the one from your virtual environment. In VS Code, click the kernel name in the top right of the notebook and select **Python (lab02)**.
**NumPy version conflict errors**
Make sure you installed `numpy<2.0` as described in Step 2. If you already have a newer version, downgrade with:
```bash
pip install "numpy<2.0" --force-reinstall
```