Add README, starter notebooks, and .gitignore

Add a comprehensive README that explains how to prepare a Python virtual environment, install packages, and register an ipykernel for running the lab notebooks in VS Code. Include two starter notebooks: TASK0_Datasaurus_Starter.ipynb (datasaurus warm-up with data loading, summary stats, optional SweetViz report, and faceted scatter plotting) and VisInt_Lab_01_Task_0.ipynb (lab header/metadata). Add .gitignore to exclude a local /.venv directory.
2026-02-21 16:33:46 +00:00
parent 8787351455
commit 52e38435fa
4 changed files with 380 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/.venv
--- a/README.md
+++ b/README.md
@@ -1,2 +1,161 @@
 # VI_Lab_01_EDA

+
+
+
+
+
+
+
+pip install ipykernel
+
+***
+
+Below is a clean, ready‑to‑ship **README.md** you can drop directly into your ZIP bundle.  
+It explains **exactly how students should prepare their environment in VS Code**, including:
+
+*   Installing Python
+*   Creating a **virtual environment**
+*   Installing required packages
+*   Setting up the **Jupyter kernel** to use that venv
+*   Opening and running the notebooks in VS Code
+
+It uses **current and correct instructions** based on official VS Code documentation (Python + Jupyter extensions and venv usage)  (installation workflow and environment activation practices align with Python & VS Code official practices, which are stable across versions).
+
+If you want, I can also generate a **requirements.txt**, **environment.yml**, or a **bootstrap script**.
+
+***
+
+# 📘 README — Preparing Your Environment for Jupyter in VS Code
+
+## (Virtual Environment + Kernel Setup)
+
+This guide explains exactly how to prepare your system to run the EDA lab notebooks in **VS Code** using a clean Python **virtual environment**.
+
+The steps work on **Windows, macOS, and Linux**.
+
+***
+
+# 1. Install the Required Tools
+
+### 1.1 Install Python (3.9+ recommended)
+
+Download from the official Python site (*python.org*) or using Microsoft Store.
+
+Make sure to check:
+
+*   **Windows** → Add Python to PATH if installed from official site
+*   **macOS/Linux** → Python is usually included, but upgrade if needed
+
+### 1.2 Install VS Code
+
+Install from the official VS Code site.
+
+### 1.3 Install VS Code Extensions
+
+Open VS Code → **Extensions Panel** → install:
+
+*   **Python**
+*   **Jupyter**
+
+These two extensions enable:
+
+*   Notebook execution
+*   Kernel selection
+*   Virtual environment detection
+*   Interactive cells
+
+***
+
+# 2. Create a Virtual Environment
+
+Choose a folder where you will store your lab materials.  
+Open a terminal *inside that folder*:
+
+### **Windows (PowerShell)**
+
+```powershell
+python -m venv venv
+.\venv\Scripts\activate
+```
+
+### **macOS / Linux**
+
+```bash
+python3 -m venv venv
+source venv/bin/activate
+```
+
+You should now see `(venv)` at the start of your terminal prompt.
+
+***
+
+# 3. Install Required Python Packages
+
+Inside the active virtual environment, run:
+
+```bash
+pip install numpy pandas matplotlib sweetviz dtale jupyter
+```
+
+If you are using the Task 0 datasets, also install:
+
+```bash
+pip install seaborn
+```
+
+> 💡 **Tip:**  
+> If you have a `requirements.txt` in the bundle, run:
+>
+> ```bash
+> pip install -r requirements.txt
+> ```
+
+***
+
+# 4. Register the Virtual Environment as a Jupyter Kernel
+
+VS Code can automatically detect your venv, but we ensure explicit registration:
+
+```bash
+python -m ipykernel install --user --name eda-env --display-name "EDA Lab Environment"
+```
+
+You will now see **EDA Lab Environment** as a selectable kernel inside VS Code notebooks.
+
+***
+
+# 5. ✅ Open the Lab in VS Code
+
+1.  Launch **VS Code**
+2.  Use **File → Open Folder** and choose the folder containing the lab files
+3.  Open any `.ipynb` file (e.g., `EDA_Lab_Starter.ipynb`)
+4.  At the top‑right corner of the notebook, click the **kernel selector**
+5.  Choose:  
+    **EDA Lab Environment (Python venv)**
+
+This ensures the notebook runs using the correct interpreter.
+
+***
+
+# 6. 🔍 (Optional) Verify Your Setup
+
+In a notebook cell, run:
+
+```python
+import sys
+sys.executable
+```
+
+It should show the Python path inside your `venv`, e.g.:
+
+*   Windows: `…/venv/Scripts/python.exe`
+*   macOS/Linux: `…/venv/bin/python`
+
+Then check that the packages are available:
+
+```python
+import pandas, sweetviz, dtale
+print("Environment OK")
+```
+
--- a/TASK0_Datasaurus_Starter.ipynb
+++ b/TASK0_Datasaurus_Starter.ipynb
@@ -0,0 +1,184 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Task 0 — Datasaurus Warm‑Up (Starter)\n",
+    "\n",
+    "**Goal:** Show why we must *always* visualize by comparing groups with nearly identical summary statistics but very different shapes when plotted.\n",
+    "\n",
+    "This starter uses `datasaurus_task0.csv` (long format) with four groups: `dino`, `star`, `circle`, `bullseye`.\n",
+    "\n",
+    "**What to do:**\n",
+    "1. Load the CSV (as strings first).\n",
+    "2. Compute basic stats per group (mean, std, correlation).\n",
+    "3. Generate a SweetViz report (optional but recommended).\n",
+    "4. Plot x vs y for each group (facet grid).\n",
+    "5. Write *2–4 sentences* reflecting on why summary stats were misleading.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 0) (Optional) Install packages in this environment\n",
+    "Uncomment if needed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {
+    "tags": [
+     "setup"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# !pip install -q numpy pandas seaborn matplotlib sweetviz dtale\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1) Load the dataset\n",
+    "Load as strings first (safer), then coerce numeric columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mRunning cells with '.venv (Python 3.11.9)' requires the ipykernel package.\n",
+      "\u001b[1;31mInstall 'ipykernel' into the Python environment. \n",
+      "\u001b[1;31mCommand: 'd:/Projects/43679_InteractiveVis/VI_Lab_01_EDA/.venv/Scripts/python.exe -m pip install ipykernel -U --force-reinstall'"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "csv_path = 'datasaurus_task0.csv'\n",
+    "df_raw = pd.read_csv(csv_path, dtype=str)\n",
+    "df_raw.head()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Coerce numeric columns and quick sanity checks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = df_raw.copy()\n",
+    "for c in ['x','y']:\n",
+    "    df[c] = pd.to_numeric(df[c], errors='coerce')\n",
+    "df.info()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2) Basic summary stats by group (fill in)\n",
+    "Compute mean, std for x & y by `dataset`, and the correlation within each group."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: groupby summaries\n",
+    "# g = df.groupby('dataset')\n",
+    "# means = g[['x','y']].mean()\n",
+    "# stds  = g[['x','y']].std()\n",
+    "# corr  = g.apply(lambda d: d[['x','y']].corr().iloc[0,1])\n",
+    "# means, stds, corr\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3) (Optional) SweetViz profile\n",
+    "Generate a quick report to observe that top-level stats look very similar."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# import sweetviz as sv\n",
+    "# report = sv.analyze(df)\n",
+    "# report.show_html('task0_sweetviz_report.html')\n",
+    "# print('Wrote task0_sweetviz_report.html')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4) Visualize — scatter by group (facet)\n",
+    "Create a facet grid with one subplot per dataset and compare shapes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 0,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import seaborn as sns\n",
+    "import matplotlib.pyplot as plt\n",
+    "sns.set_theme(style='white', context='notebook')\n",
+    "g = sns.FacetGrid(df, col='dataset', col_wrap=2, height=4, sharex=True, sharey=True)\n",
+    "g.map_dataframe(sns.scatterplot, x='x', y='y', s=20, edgecolor=None)\n",
+    "g.set_titles('{col_name}')\n",
+    "for ax in g.axes.flatten():\n",
+    "    ax.set_xlabel('x'); ax.set_ylabel('y')\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5) Reflection (write here)\n",
+    "**Prompt:** If the per-group mean/variance/correlation were similar, why do the plots look different?\n",
+    "- Which shapes do you observe?\n",
+    "- What does this imply for relying solely on `.describe()` or correlation before plotting?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/VisInt_Lab_01_Task_0.ipynb
+++ b/VisInt_Lab_01_Task_0.ipynb
@@ -0,0 +1,36 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "64f0fe5d",
+   "metadata": {},
+   "source": [
+    "**43679 - Interactive Visualization**\n",
+    "**2025 - 2026**\n",
+    "*2nd semester*\n",
+    "\n",
+    "**Lab 01** - Task 0\n",
+    "Exploring the value of Visualization to go beyond descriptive statistics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d9080704",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}