Add README, starter notebooks, and .gitignore
Add a comprehensive README that explains how to prepare a Python virtual environment, install packages, and register an ipykernel for running the lab notebooks in VS Code. Include two starter notebooks: TASK0_Datasaurus_Starter.ipynb (datasaurus warm-up with data loading, summary stats, optional SweetViz report, and faceted scatter plotting) and VisInt_Lab_01_Task_0.ipynb (lab header/metadata). Add .gitignore to exclude a local /.venv directory.
This commit is contained in:
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
/.venv
|
||||||
159
README.md
159
README.md
@@ -1,2 +1,161 @@
|
|||||||
# VI_Lab_01_EDA
|
# VI_Lab_01_EDA
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
pip install ipykernel
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
Below is a clean, ready‑to‑ship **README.md** you can drop directly into your ZIP bundle.
|
||||||
|
It explains **exactly how students should prepare their environment in VS Code**, including:
|
||||||
|
|
||||||
|
* Installing Python
|
||||||
|
* Creating a **virtual environment**
|
||||||
|
* Installing required packages
|
||||||
|
* Setting up the **Jupyter kernel** to use that venv
|
||||||
|
* Opening and running the notebooks in VS Code
|
||||||
|
|
||||||
|
It uses **current and correct instructions** based on official VS Code documentation (Python + Jupyter extensions and venv usage) (installation workflow and environment activation practices align with Python & VS Code official practices, which are stable across versions).
|
||||||
|
|
||||||
|
If you want, I can also generate a **requirements.txt**, **environment.yml**, or a **bootstrap script**.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 📘 README — Preparing Your Environment for Jupyter in VS Code
|
||||||
|
|
||||||
|
## (Virtual Environment + Kernel Setup)
|
||||||
|
|
||||||
|
This guide explains exactly how to prepare your system to run the EDA lab notebooks in **VS Code** using a clean Python **virtual environment**.
|
||||||
|
|
||||||
|
The steps work on **Windows, macOS, and Linux**.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 1. Install the Required Tools
|
||||||
|
|
||||||
|
### 1.1 Install Python (3.9+ recommended)
|
||||||
|
|
||||||
|
Download from the official Python site (*python.org*) or using Microsoft Store.
|
||||||
|
|
||||||
|
Make sure to check:
|
||||||
|
|
||||||
|
* **Windows** → Add Python to PATH if installed from official site
|
||||||
|
* **macOS/Linux** → Python is usually included, but upgrade if needed
|
||||||
|
|
||||||
|
### 1.2 Install VS Code
|
||||||
|
|
||||||
|
Install from the official VS Code site.
|
||||||
|
|
||||||
|
### 1.3 Install VS Code Extensions
|
||||||
|
|
||||||
|
Open VS Code → **Extensions Panel** → install:
|
||||||
|
|
||||||
|
* **Python**
|
||||||
|
* **Jupyter**
|
||||||
|
|
||||||
|
These two extensions enable:
|
||||||
|
|
||||||
|
* Notebook execution
|
||||||
|
* Kernel selection
|
||||||
|
* Virtual environment detection
|
||||||
|
* Interactive cells
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 2. Create a Virtual Environment
|
||||||
|
|
||||||
|
Choose a folder where you will store your lab materials.
|
||||||
|
Open a terminal *inside that folder*:
|
||||||
|
|
||||||
|
### **Windows (PowerShell)**
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m venv venv
|
||||||
|
.\venv\Scripts\activate
|
||||||
|
```
|
||||||
|
|
||||||
|
### **macOS / Linux**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m venv venv
|
||||||
|
source venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
You should now see `(venv)` at the start of your terminal prompt.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 3. Install Required Python Packages
|
||||||
|
|
||||||
|
Inside the active virtual environment, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install numpy pandas matplotlib sweetviz dtale jupyter
|
||||||
|
```
|
||||||
|
|
||||||
|
If you are using the Task 0 datasets, also install:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install seaborn
|
||||||
|
```
|
||||||
|
|
||||||
|
> 💡 **Tip:**
|
||||||
|
> If you have a `requirements.txt` in the bundle, run:
|
||||||
|
>
|
||||||
|
> ```bash
|
||||||
|
> pip install -r requirements.txt
|
||||||
|
> ```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 4. Register the Virtual Environment as a Jupyter Kernel
|
||||||
|
|
||||||
|
VS Code can automatically detect your venv, but we ensure explicit registration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m ipykernel install --user --name eda-env --display-name "EDA Lab Environment"
|
||||||
|
```
|
||||||
|
|
||||||
|
You will now see **EDA Lab Environment** as a selectable kernel inside VS Code notebooks.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 5. ✅ Open the Lab in VS Code
|
||||||
|
|
||||||
|
1. Launch **VS Code**
|
||||||
|
2. Use **File → Open Folder** and choose the folder containing the lab files
|
||||||
|
3. Open any `.ipynb` file (e.g., `EDA_Lab_Starter.ipynb`)
|
||||||
|
4. At the top‑right corner of the notebook, click the **kernel selector**
|
||||||
|
5. Choose:
|
||||||
|
**EDA Lab Environment (Python venv)**
|
||||||
|
|
||||||
|
This ensures the notebook runs using the correct interpreter.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
# 6. 🔍 (Optional) Verify Your Setup
|
||||||
|
|
||||||
|
In a notebook cell, run:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import sys
|
||||||
|
sys.executable
|
||||||
|
```
|
||||||
|
|
||||||
|
It should show the Python path inside your `venv`, e.g.:
|
||||||
|
|
||||||
|
* Windows: `…/venv/Scripts/python.exe`
|
||||||
|
* macOS/Linux: `…/venv/bin/python`
|
||||||
|
|
||||||
|
Then check that the packages are available:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas, sweetviz, dtale
|
||||||
|
print("Environment OK")
|
||||||
|
```
|
||||||
|
|
||||||
|
|||||||
184
TASK0_Datasaurus_Starter.ipynb
Normal file
184
TASK0_Datasaurus_Starter.ipynb
Normal file
@@ -0,0 +1,184 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Task 0 — Datasaurus Warm‑Up (Starter)\n",
|
||||||
|
"\n",
|
||||||
|
"**Goal:** Show why we must *always* visualize by comparing groups with nearly identical summary statistics but very different shapes when plotted.\n",
|
||||||
|
"\n",
|
||||||
|
"This starter uses `datasaurus_task0.csv` (long format) with four groups: `dino`, `star`, `circle`, `bullseye`.\n",
|
||||||
|
"\n",
|
||||||
|
"**What to do:**\n",
|
||||||
|
"1. Load the CSV (as strings first).\n",
|
||||||
|
"2. Compute basic stats per group (mean, std, correlation).\n",
|
||||||
|
"3. Generate a SweetViz report (optional but recommended).\n",
|
||||||
|
"4. Plot x vs y for each group (facet grid).\n",
|
||||||
|
"5. Write *2–4 sentences* reflecting on why summary stats were misleading.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 0) (Optional) Install packages in this environment\n",
|
||||||
|
"Uncomment if needed."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 0,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# !pip install -q numpy pandas seaborn matplotlib sweetviz dtale\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 1) Load the dataset\n",
|
||||||
|
"Load as strings first (safer), then coerce numeric columns."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 0,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"ename": "",
|
||||||
|
"evalue": "",
|
||||||
|
"output_type": "error",
|
||||||
|
"traceback": [
|
||||||
|
"\u001b[1;31mRunning cells with '.venv (Python 3.11.9)' requires the ipykernel package.\n",
|
||||||
|
"\u001b[1;31mInstall 'ipykernel' into the Python environment. \n",
|
||||||
|
"\u001b[1;31mCommand: 'd:/Projects/43679_InteractiveVis/VI_Lab_01_EDA/.venv/Scripts/python.exe -m pip install ipykernel -U --force-reinstall'"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"csv_path = 'datasaurus_task0.csv'\n",
|
||||||
|
"df_raw = pd.read_csv(csv_path, dtype=str)\n",
|
||||||
|
"df_raw.head()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Coerce numeric columns and quick sanity checks"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 0,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df = df_raw.copy()\n",
|
||||||
|
"for c in ['x','y']:\n",
|
||||||
|
" df[c] = pd.to_numeric(df[c], errors='coerce')\n",
|
||||||
|
"df.info()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 2) Basic summary stats by group (fill in)\n",
|
||||||
|
"Compute mean, std for x & y by `dataset`, and the correlation within each group."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 0,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# TODO: groupby summaries\n",
|
||||||
|
"# g = df.groupby('dataset')\n",
|
||||||
|
"# means = g[['x','y']].mean()\n",
|
||||||
|
"# stds = g[['x','y']].std()\n",
|
||||||
|
"# corr = g.apply(lambda d: d[['x','y']].corr().iloc[0,1])\n",
|
||||||
|
"# means, stds, corr\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 3) (Optional) SweetViz profile\n",
|
||||||
|
"Generate a quick report to observe that top-level stats look very similar."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 0,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# import sweetviz as sv\n",
|
||||||
|
"# report = sv.analyze(df)\n",
|
||||||
|
"# report.show_html('task0_sweetviz_report.html')\n",
|
||||||
|
"# print('Wrote task0_sweetviz_report.html')\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 4) Visualize — scatter by group (facet)\n",
|
||||||
|
"Create a facet grid with one subplot per dataset and compare shapes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 0,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import seaborn as sns\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"sns.set_theme(style='white', context='notebook')\n",
|
||||||
|
"g = sns.FacetGrid(df, col='dataset', col_wrap=2, height=4, sharex=True, sharey=True)\n",
|
||||||
|
"g.map_dataframe(sns.scatterplot, x='x', y='y', s=20, edgecolor=None)\n",
|
||||||
|
"g.set_titles('{col_name}')\n",
|
||||||
|
"for ax in g.axes.flatten():\n",
|
||||||
|
" ax.set_xlabel('x'); ax.set_ylabel('y')\n",
|
||||||
|
"plt.tight_layout()\n",
|
||||||
|
"plt.show()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 5) Reflection (write here)\n",
|
||||||
|
"**Prompt:** If the per-group mean/variance/correlation were similar, why do the plots look different?\n",
|
||||||
|
"- Which shapes do you observe?\n",
|
||||||
|
"- What does this imply for relying solely on `.describe()` or correlation before plotting?\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": ".venv",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python",
|
||||||
|
"version": "3.11.9"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
||||||
36
VisInt_Lab_01_Task_0.ipynb
Normal file
36
VisInt_Lab_01_Task_0.ipynb
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "64f0fe5d",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"**43679 - Interactive Visualization**\n",
|
||||||
|
"**2025 - 2026**\n",
|
||||||
|
"*2nd semester*\n",
|
||||||
|
"\n",
|
||||||
|
"**Lab 01** - Task 0\n",
|
||||||
|
"Exploring the value of Visualization to go beyond descriptive statistics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "d9080704",
|
||||||
|
"metadata": {
|
||||||
|
"vscode": {
|
||||||
|
"languageId": "plaintext"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user