a lot
This commit is contained in:
145
gpt-sugg/Datasaurus_Lab_Instructor_Intro.ipynb
Normal file
145
gpt-sugg/Datasaurus_Lab_Instructor_Intro.ipynb
Normal file
@@ -0,0 +1,145 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4e23ea39",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"# 📘 Instructor Version – Introductory EDA Lab\n",
|
||||
"\n",
|
||||
"## Learning Objectives\n",
|
||||
"\n",
|
||||
"Students should:\n",
|
||||
"\n",
|
||||
"- Understand what a dataset structure looks like\n",
|
||||
"- Identify variable types\n",
|
||||
"- Compute descriptive statistics\n",
|
||||
"- Recognize the limits of summary statistics\n",
|
||||
"- Appreciate visualization as a fundamental step in EDA\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "38ebe89c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Teaching Strategy\n",
|
||||
"\n",
|
||||
"This is NOT a technical coding lab.\n",
|
||||
"\n",
|
||||
"It is conceptual:\n",
|
||||
"- Data structure awareness\n",
|
||||
"- Reading metadata\n",
|
||||
"- Interpreting statistics\n",
|
||||
"- Understanding why visualization matters\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ecee2660",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"import pyreadr\n",
|
||||
"import pandas as pd\n",
|
||||
"import sweetviz as sv\n",
|
||||
"import dtale\n",
|
||||
"\n",
|
||||
"result = pyreadr.read_r(\"datasaurus_dozen.rda\")\n",
|
||||
"df = list(result.values())[0]\n",
|
||||
"df.head()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ca5dfd49",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Discussion Prompts\n",
|
||||
"\n",
|
||||
"### After df.info():\n",
|
||||
"- What is categorical?\n",
|
||||
"- What is numerical?\n",
|
||||
"- Why does data type matter?\n",
|
||||
"\n",
|
||||
"### After df.describe():\n",
|
||||
"Important insight:\n",
|
||||
"Different datasets may have nearly identical summary statistics.\n",
|
||||
"\n",
|
||||
"Ask:\n",
|
||||
"Would you trust the numbers without visualization?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7c61c04b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"df.describe()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5093ed70",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Sweetviz Discussion\n",
|
||||
"\n",
|
||||
"Use the report to show:\n",
|
||||
"\n",
|
||||
"- Similar means and standard deviations\n",
|
||||
"- Very different visual distributions\n",
|
||||
"- The importance of scatter plots\n",
|
||||
"\n",
|
||||
"Key message:\n",
|
||||
"📌 \"Statistics describe. Visualization reveals.\"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8ae6139f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"report = sv.analyze(df)\n",
|
||||
"report.show_html(\"sweetviz_report.html\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d3a3d619",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Key Concept to Emphasize\n",
|
||||
"\n",
|
||||
"EDA is:\n",
|
||||
"- Understanding structure\n",
|
||||
"- Understanding distributions\n",
|
||||
"- Detecting anomalies\n",
|
||||
"- Preparing for cleaning\n",
|
||||
"\n",
|
||||
"Next lab:\n",
|
||||
"Students receive messy datasets with:\n",
|
||||
"- Missing values\n",
|
||||
"- Wrong types\n",
|
||||
"- Outliers\n",
|
||||
"- Inconsistent categories\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
2720
gpt-sugg/Datasaurus_Lab_Student_Intro.ipynb
Normal file
2720
gpt-sugg/Datasaurus_Lab_Student_Intro.ipynb
Normal file
File diff suppressed because one or more lines are too long
3894
gpt-sugg/scatter_export_1771754524118.html
Normal file
3894
gpt-sugg/scatter_export_1771754524118.html
Normal file
File diff suppressed because one or more lines are too long
2053
gpt-sugg/sweetviz_report.html
Normal file
2053
gpt-sugg/sweetviz_report.html
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user