STAT20059 · Week 3
Describing
Data
Measures of center, spread, and distribution shape
Learning Objectives
By the end of this week
- Calculate mean, median, and mode
- Compare when each center measure is best
- Interpret range, variance, and standard deviation
Applied focus
- Read summaries in business reports correctly
- Spot misleading conclusions from averages alone
- Communicate results in plain language
Why Summarise Data?
Raw values are hard to compare. Summary statistics help us see the signal quickly.
Center
Where is the middle of the data?
Spread
How much variability is there?
Shape
Is the distribution symmetric or skewed?
Good analysis needs all three: center + spread + shape.
Measures of Center
| Measure | Definition | Best Used When |
| Mean | Arithmetic average of all values | Data is roughly symmetric; no extreme outliers |
| Median | Middle value when ordered | Data is skewed or has outliers |
| Mode | Most frequent value/category | Categorical data or repeated values are important |
Worked Example: Mean and Median
Sample weekly sales ($000): 18, 20, 22, 23, 24, 26, 75
Mean
(18 + 20 + 22 + 23 + 24 + 26 + 75) / 7 = 29.7
Pulled upward by one large value (75).
Median
Ordered data -> middle (4th) value = 23
More representative of a typical week here.
Quick Quiz 1
A salary dataset has a few very high executive salaries. Which measure of center is usually better for reporting a "typical" salary?
Measures of Spread
Range
Range = max - min
Simple, but based on only two values.
Variance
Average squared distance from the mean.
In squared units; useful for calculations.
Standard Deviation
Square root of variance. It measures typical distance from the mean in the original unit.
Interpreting Standard Deviation
Mean exam score = 68, standard deviation = 4
Interpretation
Many scores are around 68 ± 4, so roughly between 64 and 72.
Practical message
Small SD -> consistent performance. Large SD -> highly variable performance.
Skewness and Outliers
| Pattern | Visual Shape | Typical Relationship |
| Symmetric | Balanced left and right | Mean approximately Median |
| Right-skewed | Long tail to the right | Mean > Median |
| Left-skewed | Long tail to the left | Mean < Median |
Always comment on outliers before choosing summary statistics.
Quick Quiz 2
If two datasets have the same mean, but Dataset A has a much larger standard deviation than Dataset B, what does that imply?
Week 3 Takeaways
Center
- Mean uses all values, sensitive to outliers
- Median is robust for skewed data
- Mode helps with common categories
Spread and Shape
- Range, variance, and SD quantify variability
- Skewness affects interpretation
- Never report one summary measure in isolation
Next: apply these ideas to probability distributions and inference.