This commit is contained in:
2026-02-23 08:21:32 +00:00
parent 52e38435fa
commit ed360f9967
34 changed files with 73045 additions and 37 deletions

View File

@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4e23ea39",
"metadata": {},
"source": [
"\n",
"# 📘 Instructor Version Introductory EDA Lab\n",
"\n",
"## Learning Objectives\n",
"\n",
"Students should:\n",
"\n",
"- Understand what a dataset structure looks like\n",
"- Identify variable types\n",
"- Compute descriptive statistics\n",
"- Recognize the limits of summary statistics\n",
"- Appreciate visualization as a fundamental step in EDA\n"
]
},
{
"cell_type": "markdown",
"id": "38ebe89c",
"metadata": {},
"source": [
"\n",
"## Teaching Strategy\n",
"\n",
"This is NOT a technical coding lab.\n",
"\n",
"It is conceptual:\n",
"- Data structure awareness\n",
"- Reading metadata\n",
"- Interpreting statistics\n",
"- Understanding why visualization matters\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ecee2660",
"metadata": {},
"outputs": [],
"source": [
"\n",
"import pyreadr\n",
"import pandas as pd\n",
"import sweetviz as sv\n",
"import dtale\n",
"\n",
"result = pyreadr.read_r(\"datasaurus_dozen.rda\")\n",
"df = list(result.values())[0]\n",
"df.head()\n"
]
},
{
"cell_type": "markdown",
"id": "ca5dfd49",
"metadata": {},
"source": [
"\n",
"## Discussion Prompts\n",
"\n",
"### After df.info():\n",
"- What is categorical?\n",
"- What is numerical?\n",
"- Why does data type matter?\n",
"\n",
"### After df.describe():\n",
"Important insight:\n",
"Different datasets may have nearly identical summary statistics.\n",
"\n",
"Ask:\n",
"Would you trust the numbers without visualization?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c61c04b",
"metadata": {},
"outputs": [],
"source": [
"\n",
"df.describe()\n"
]
},
{
"cell_type": "markdown",
"id": "5093ed70",
"metadata": {},
"source": [
"\n",
"## Sweetviz Discussion\n",
"\n",
"Use the report to show:\n",
"\n",
"- Similar means and standard deviations\n",
"- Very different visual distributions\n",
"- The importance of scatter plots\n",
"\n",
"Key message:\n",
"📌 \"Statistics describe. Visualization reveals.\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ae6139f",
"metadata": {},
"outputs": [],
"source": [
"\n",
"report = sv.analyze(df)\n",
"report.show_html(\"sweetviz_report.html\")\n"
]
},
{
"cell_type": "markdown",
"id": "d3a3d619",
"metadata": {},
"source": [
"\n",
"## Key Concept to Emphasize\n",
"\n",
"EDA is:\n",
"- Understanding structure\n",
"- Understanding distributions\n",
"- Detecting anomalies\n",
"- Preparing for cleaning\n",
"\n",
"Next lab:\n",
"Students receive messy datasets with:\n",
"- Missing values\n",
"- Wrong types\n",
"- Outliers\n",
"- Inconsistent categories\n"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long