Overview
Dataset statistics
| Number of variables | 7 |
|---|---|
| Number of observations | 344 |
| Missing cells | 19 |
| Missing cells (%) | 0.8% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 18.9 KiB |
| Average record size in memory | 56.4 B |
Variable types
| Categorical | 3 |
|---|---|
| Numeric | 4 |
bill_depth_mm is highly overall correlated with flipper_length_mm and 2 other fields | High correlation |
bill_length_mm is highly overall correlated with body_mass_g and 3 other fields | High correlation |
body_mass_g is highly overall correlated with bill_length_mm and 3 other fields | High correlation |
flipper_length_mm is highly overall correlated with bill_depth_mm and 4 other fields | High correlation |
island is highly overall correlated with flipper_length_mm and 1 other fields | High correlation |
sex is highly overall correlated with bill_depth_mm and 2 other fields | High correlation |
species is highly overall correlated with bill_depth_mm and 4 other fields | High correlation |
sex has 11 (3.2%) missing values | Missing |
Reproduction
| Analysis started | 2026-02-22 11:30:32.868705 |
|---|---|
| Analysis finished | 2026-02-22 11:30:34.289502 |
| Duration | 1.42 second |
| Software version | ydata-profiling vv4.18.1 |
| Download configuration | config.json |
Variables
species
Categorical
High correlation
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 KiB |
| Adelie | |
|---|---|
| Gentoo | |
| Chinstrap |
Length
| Max length | 9 |
|---|---|
| Median length | 6 |
| Mean length | 6.5930233 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Adelie |
|---|---|
| 2nd row | Adelie |
| 3rd row | Adelie |
| 4th row | Adelie |
| 5th row | Adelie |
Common Values
| Value | Count | Frequency (%) |
| Adelie | 152 | |
| Gentoo | 124 | |
| Chinstrap | 68 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| adelie | 152 | |
| gentoo | 124 | |
| chinstrap | 68 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 428 | |
| o | 248 | |
| i | 220 | |
| n | 192 | |
| t | 192 | |
| A | 152 | 6.7% |
| d | 152 | 6.7% |
| l | 152 | 6.7% |
| G | 124 | 5.5% |
| C | 68 | 3.0% |
| Other values (5) | 340 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2268 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 428 | |
| o | 248 | |
| i | 220 | |
| n | 192 | |
| t | 192 | |
| A | 152 | 6.7% |
| d | 152 | 6.7% |
| l | 152 | 6.7% |
| G | 124 | 5.5% |
| C | 68 | 3.0% |
| Other values (5) | 340 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2268 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 428 | |
| o | 248 | |
| i | 220 | |
| n | 192 | |
| t | 192 | |
| A | 152 | 6.7% |
| d | 152 | 6.7% |
| l | 152 | 6.7% |
| G | 124 | 5.5% |
| C | 68 | 3.0% |
| Other values (5) | 340 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2268 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 428 | |
| o | 248 | |
| i | 220 | |
| n | 192 | |
| t | 192 | |
| A | 152 | 6.7% |
| d | 152 | 6.7% |
| l | 152 | 6.7% |
| G | 124 | 5.5% |
| C | 68 | 3.0% |
| Other values (5) | 340 |
island
Categorical
High correlation
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 KiB |
| Biscoe | |
|---|---|
| Dream | |
| Torgersen |
Length
| Max length | 9 |
|---|---|
| Median length | 6 |
| Mean length | 6.0930233 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Torgersen |
|---|---|
| 2nd row | Torgersen |
| 3rd row | Torgersen |
| 4th row | Torgersen |
| 5th row | Torgersen |
Common Values
| Value | Count | Frequency (%) |
| Biscoe | 168 | |
| Dream | 124 | |
| Torgersen | 52 | 15.1% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| biscoe | 168 | |
| dream | 124 | |
| torgersen | 52 | 15.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 396 | |
| r | 228 | |
| s | 220 | |
| o | 220 | |
| B | 168 | |
| i | 168 | |
| c | 168 | |
| D | 124 | 5.9% |
| a | 124 | 5.9% |
| m | 124 | 5.9% |
| Other values (3) | 156 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 396 | |
| r | 228 | |
| s | 220 | |
| o | 220 | |
| B | 168 | |
| i | 168 | |
| c | 168 | |
| D | 124 | 5.9% |
| a | 124 | 5.9% |
| m | 124 | 5.9% |
| Other values (3) | 156 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 396 | |
| r | 228 | |
| s | 220 | |
| o | 220 | |
| B | 168 | |
| i | 168 | |
| c | 168 | |
| D | 124 | 5.9% |
| a | 124 | 5.9% |
| m | 124 | 5.9% |
| Other values (3) | 156 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 396 | |
| r | 228 | |
| s | 220 | |
| o | 220 | |
| B | 168 | |
| i | 168 | |
| c | 168 | |
| D | 124 | 5.9% |
| a | 124 | 5.9% |
| m | 124 | 5.9% |
| Other values (3) | 156 | 7.4% |
bill_length_mm
Real number (ℝ)
High correlation
| Distinct | 164 |
|---|---|
| Distinct (%) | 48.0% |
| Missing | 2 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.92193 |
| Minimum | 32.1 |
|---|---|
| Maximum | 59.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 KiB |
Quantile statistics
| Minimum | 32.1 |
|---|---|
| 5-th percentile | 35.7 |
| Q1 | 39.225 |
| median | 44.45 |
| Q3 | 48.5 |
| 95-th percentile | 51.995 |
| Maximum | 59.6 |
| Range | 27.5 |
| Interquartile range (IQR) | 9.275 |
Descriptive statistics
| Standard deviation | 5.4595837 |
|---|---|
| Coefficient of variation (CV) | 0.124302 |
| Kurtosis | -0.87602697 |
| Mean | 43.92193 |
| Median Absolute Deviation (MAD) | 4.75 |
| Skewness | 0.053118067 |
| Sum | 15021.3 |
| Variance | 29.807054 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 41.1 | 7 | 2.0% |
| 45.2 | 6 | 1.7% |
| 45.5 | 5 | 1.5% |
| 39.6 | 5 | 1.5% |
| 50.5 | 5 | 1.5% |
| 46.5 | 5 | 1.5% |
| 50 | 5 | 1.5% |
| 37.8 | 5 | 1.5% |
| 46.2 | 5 | 1.5% |
| 46.4 | 4 | 1.2% |
| Other values (154) | 290 |
| Value | Count | Frequency (%) |
| 32.1 | 1 | |
| 33.1 | 1 | |
| 33.5 | 1 | |
| 34 | 1 | |
| 34.1 | 1 | |
| 34.4 | 1 | |
| 34.5 | 1 | |
| 34.6 | 2 | |
| 35 | 2 | |
| 35.1 | 1 |
| Value | Count | Frequency (%) |
| 59.6 | 1 | |
| 58 | 1 | |
| 55.9 | 1 | |
| 55.8 | 1 | |
| 55.1 | 1 | |
| 54.3 | 1 | |
| 54.2 | 1 | |
| 53.5 | 1 | |
| 53.4 | 1 | |
| 52.8 | 1 |
bill_depth_mm
Real number (ℝ)
High correlation
| Distinct | 80 |
|---|---|
| Distinct (%) | 23.4% |
| Missing | 2 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.15117 |
| Minimum | 13.1 |
|---|---|
| Maximum | 21.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 KiB |
Quantile statistics
| Minimum | 13.1 |
|---|---|
| 5-th percentile | 13.9 |
| Q1 | 15.6 |
| median | 17.3 |
| Q3 | 18.7 |
| 95-th percentile | 20 |
| Maximum | 21.5 |
| Range | 8.4 |
| Interquartile range (IQR) | 3.1 |
Descriptive statistics
| Standard deviation | 1.9747932 |
|---|---|
| Coefficient of variation (CV) | 0.11514044 |
| Kurtosis | -0.90686609 |
| Mean | 17.15117 |
| Median Absolute Deviation (MAD) | 1.5 |
| Skewness | -0.14346463 |
| Sum | 5865.7 |
| Variance | 3.899808 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17 | 12 | 3.5% |
| 15 | 10 | 2.9% |
| 18.6 | 10 | 2.9% |
| 17.9 | 10 | 2.9% |
| 18.5 | 10 | 2.9% |
| 17.3 | 9 | 2.6% |
| 18.9 | 9 | 2.6% |
| 19 | 9 | 2.6% |
| 17.8 | 9 | 2.6% |
| 18.1 | 9 | 2.6% |
| Other values (70) | 245 |
| Value | Count | Frequency (%) |
| 13.1 | 1 | 0.3% |
| 13.2 | 1 | 0.3% |
| 13.3 | 1 | 0.3% |
| 13.4 | 1 | 0.3% |
| 13.5 | 2 | 0.6% |
| 13.6 | 1 | 0.3% |
| 13.7 | 6 | |
| 13.8 | 4 | |
| 13.9 | 4 | |
| 14 | 2 | 0.6% |
| Value | Count | Frequency (%) |
| 21.5 | 1 | 0.3% |
| 21.2 | 2 | |
| 21.1 | 3 | |
| 20.8 | 1 | 0.3% |
| 20.7 | 3 | |
| 20.6 | 1 | 0.3% |
| 20.5 | 1 | 0.3% |
| 20.3 | 3 | |
| 20.2 | 1 | 0.3% |
| 20.1 | 1 | 0.3% |
flipper_length_mm
Real number (ℝ)
High correlation
| Distinct | 55 |
|---|---|
| Distinct (%) | 16.1% |
| Missing | 2 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 200.9152 |
| Minimum | 172 |
|---|---|
| Maximum | 231 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 KiB |
Quantile statistics
| Minimum | 172 |
|---|---|
| 5-th percentile | 181 |
| Q1 | 190 |
| median | 197 |
| Q3 | 213 |
| 95-th percentile | 225 |
| Maximum | 231 |
| Range | 59 |
| Interquartile range (IQR) | 23 |
Descriptive statistics
| Standard deviation | 14.061714 |
|---|---|
| Coefficient of variation (CV) | 0.0699883 |
| Kurtosis | -0.98427289 |
| Mean | 200.9152 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.34568183 |
| Sum | 68713 |
| Variance | 197.73179 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 190 | 22 | 6.4% |
| 195 | 17 | 4.9% |
| 187 | 16 | 4.7% |
| 193 | 15 | 4.4% |
| 210 | 14 | 4.1% |
| 191 | 13 | 3.8% |
| 215 | 12 | 3.5% |
| 196 | 10 | 2.9% |
| 197 | 10 | 2.9% |
| 185 | 9 | 2.6% |
| Other values (45) | 204 |
| Value | Count | Frequency (%) |
| 172 | 1 | 0.3% |
| 174 | 1 | 0.3% |
| 176 | 1 | 0.3% |
| 178 | 4 | |
| 179 | 1 | 0.3% |
| 180 | 5 | |
| 181 | 7 | |
| 182 | 3 | |
| 183 | 2 | 0.6% |
| 184 | 7 |
| Value | Count | Frequency (%) |
| 231 | 1 | 0.3% |
| 230 | 7 | |
| 229 | 2 | 0.6% |
| 228 | 4 | |
| 226 | 1 | 0.3% |
| 225 | 4 | |
| 224 | 3 | |
| 223 | 2 | 0.6% |
| 222 | 6 | |
| 221 | 5 |
body_mass_g
Real number (ℝ)
High correlation
| Distinct | 94 |
|---|---|
| Distinct (%) | 27.5% |
| Missing | 2 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4201.7544 |
| Minimum | 2700 |
|---|---|
| Maximum | 6300 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 KiB |
Quantile statistics
| Minimum | 2700 |
|---|---|
| 5-th percentile | 3150 |
| Q1 | 3550 |
| median | 4050 |
| Q3 | 4750 |
| 95-th percentile | 5650 |
| Maximum | 6300 |
| Range | 3600 |
| Interquartile range (IQR) | 1200 |
Descriptive statistics
| Standard deviation | 801.95454 |
|---|---|
| Coefficient of variation (CV) | 0.19086183 |
| Kurtosis | -0.71922187 |
| Mean | 4201.7544 |
| Median Absolute Deviation (MAD) | 600 |
| Skewness | 0.47032933 |
| Sum | 1437000 |
| Variance | 643131.08 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3800 | 12 | 3.5% |
| 3700 | 11 | 3.2% |
| 3900 | 10 | 2.9% |
| 3950 | 10 | 2.9% |
| 3550 | 9 | 2.6% |
| 4400 | 8 | 2.3% |
| 4300 | 8 | 2.3% |
| 3450 | 8 | 2.3% |
| 3400 | 8 | 2.3% |
| 3600 | 7 | 2.0% |
| Other values (84) | 251 |
| Value | Count | Frequency (%) |
| 2700 | 1 | 0.3% |
| 2850 | 2 | |
| 2900 | 4 | |
| 2925 | 1 | 0.3% |
| 2975 | 1 | 0.3% |
| 3000 | 2 | |
| 3050 | 4 | |
| 3075 | 1 | 0.3% |
| 3100 | 1 | 0.3% |
| 3150 | 4 |
| Value | Count | Frequency (%) |
| 6300 | 1 | 0.3% |
| 6050 | 1 | 0.3% |
| 6000 | 2 | 0.6% |
| 5950 | 2 | 0.6% |
| 5850 | 3 | |
| 5800 | 2 | 0.6% |
| 5750 | 1 | 0.3% |
| 5700 | 5 | |
| 5650 | 3 | |
| 5600 | 2 | 0.6% |
sex
Categorical
High correlation Missing
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 11 |
| Missing (%) | 3.2% |
| Memory size | 2.8 KiB |
| Male | |
|---|---|
| Female |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.990991 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Female |
| 3rd row | Female |
| 4th row | Female |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 168 | |
| Female | 165 | |
| (Missing) | 11 | 3.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| male | 168 | |
| female | 165 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 498 | |
| a | 333 | |
| l | 333 | |
| M | 168 | 10.1% |
| F | 165 | 9.9% |
| m | 165 | 9.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1662 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 498 | |
| a | 333 | |
| l | 333 | |
| M | 168 | 10.1% |
| F | 165 | 9.9% |
| m | 165 | 9.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1662 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 498 | |
| a | 333 | |
| l | 333 | |
| M | 168 | 10.1% |
| F | 165 | 9.9% |
| m | 165 | 9.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1662 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 498 | |
| a | 333 | |
| l | 333 | |
| M | 168 | 10.1% |
| F | 165 | 9.9% |
| m | 165 | 9.9% |
Interactions
Correlations
| bill_depth_mm | bill_length_mm | body_mass_g | flipper_length_mm | island | sex | species | |
|---|---|---|---|---|---|---|---|
| bill_depth_mm | 1.000 | -0.222 | -0.432 | -0.523 | 0.484 | 0.586 | 0.635 |
| bill_length_mm | -0.222 | 1.000 | 0.584 | 0.673 | 0.324 | 0.520 | 0.650 |
| body_mass_g | -0.432 | 0.584 | 1.000 | 0.840 | 0.456 | 0.589 | 0.605 |
| flipper_length_mm | -0.523 | 0.673 | 0.840 | 1.000 | 0.501 | 0.448 | 0.701 |
| island | 0.484 | 0.324 | 0.456 | 0.501 | 1.000 | 0.000 | 0.657 |
| sex | 0.586 | 0.520 | 0.589 | 0.448 | 0.000 | 1.000 | 0.000 |
| species | 0.635 | 0.650 | 0.605 | 0.701 | 0.657 | 0.000 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
|---|---|---|---|---|---|---|---|
| 0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male |
| 1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female |
| 2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female |
| 3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | NaN |
| 4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | Female |
| 5 | Adelie | Torgersen | 39.3 | 20.6 | 190.0 | 3650.0 | Male |
| 6 | Adelie | Torgersen | 38.9 | 17.8 | 181.0 | 3625.0 | Female |
| 7 | Adelie | Torgersen | 39.2 | 19.6 | 195.0 | 4675.0 | Male |
| 8 | Adelie | Torgersen | 34.1 | 18.1 | 193.0 | 3475.0 | NaN |
| 9 | Adelie | Torgersen | 42.0 | 20.2 | 190.0 | 4250.0 | NaN |
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
|---|---|---|---|---|---|---|---|
| 334 | Gentoo | Biscoe | 46.2 | 14.1 | 217.0 | 4375.0 | Female |
| 335 | Gentoo | Biscoe | 55.1 | 16.0 | 230.0 | 5850.0 | Male |
| 336 | Gentoo | Biscoe | 44.5 | 15.7 | 217.0 | 4875.0 | NaN |
| 337 | Gentoo | Biscoe | 48.8 | 16.2 | 222.0 | 6000.0 | Male |
| 338 | Gentoo | Biscoe | 47.2 | 13.7 | 214.0 | 4925.0 | Female |
| 339 | Gentoo | Biscoe | NaN | NaN | NaN | NaN | NaN |
| 340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | Female |
| 341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | Male |
| 342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | Female |
| 343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | Male |