{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Task 0 — Datasaurus Warm‑Up (Starter)\n", "\n", "**Goal:** Show why we must *always* visualize by comparing groups with nearly identical summary statistics but very different shapes when plotted.\n", "\n", "This starter uses `datasaurus_task0.csv` (long format) with four groups: `dino`, `star`, `circle`, `bullseye`.\n", "\n", "**What to do:**\n", "1. Load the CSV (as strings first).\n", "2. Compute basic stats per group (mean, std, correlation).\n", "3. Generate a SweetViz report (optional but recommended).\n", "4. Plot x vs y for each group (facet grid).\n", "5. Write *2–4 sentences* reflecting on why summary stats were misleading.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0) (Optional) Install packages in this environment\n", "Uncomment if needed." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "setup" ] }, "outputs": [], "source": [ "!pip install -q numpy pandas seaborn matplotlib sweetviz dtale\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Load the dataset\n", "Load as strings first (safer), then coerce numeric columns." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | dataset | \n", "x | \n", "y | \n", "
|---|---|---|---|
| 0 | \n", "dino | \n", "61.05620938387066 | \n", "50.000786041836115 | \n", "
| 1 | \n", "dino | \n", "57.91495193642296 | \n", "59.204465996007286 | \n", "
| 2 | \n", "dino | \n", "61.47023196059987 | \n", "43.113610600617946 | \n", "
| 3 | \n", "dino | \n", "57.80035367010236 | \n", "47.24321395851151 | \n", "
| 4 | \n", "dino | \n", "53.58712459282577 | \n", "50.05299250969186 | \n", "