STAT20059 · Week 3

Describing
Data

Measures of center, spread, and distribution shape

Learning Objectives

By the end of this week

  • Calculate mean, median, and mode
  • Compare when each center measure is best
  • Interpret range, variance, and standard deviation

Applied focus

  • Read summaries in business reports correctly
  • Spot misleading conclusions from averages alone
  • Communicate results in plain language

Why Summarise Data?

Center

Where is the middle of the data?

Spread

How much variability is there?

Shape

Is the distribution symmetric or skewed?

Good analysis needs all three: center + spread + shape.

Measures of Center

MeasureDefinitionBest Used When
MeanArithmetic average of all valuesData is roughly symmetric; no extreme outliers
MedianMiddle value when orderedData is skewed or has outliers
ModeMost frequent value/categoryCategorical data or repeated values are important

Worked Example: Mean and Median

Mean

(18 + 20 + 22 + 23 + 24 + 26 + 75) / 7 = 29.7

Pulled upward by one large value (75).

Median

Ordered data -> middle (4th) value = 23

More representative of a typical week here.

Quick Quiz 1

A salary dataset has a few very high executive salaries. Which measure of center is usually better for reporting a "typical" salary?

Measures of Spread

Range

Range = max - min

Simple, but based on only two values.

Variance

Average squared distance from the mean.

In squared units; useful for calculations.

Standard Deviation

Square root of variance. It measures typical distance from the mean in the original unit.

Interpreting Standard Deviation

Interpretation

Many scores are around 68 ± 4, so roughly between 64 and 72.

Practical message

Small SD -> consistent performance. Large SD -> highly variable performance.

Skewness and Outliers

PatternVisual ShapeTypical Relationship
SymmetricBalanced left and rightMean approximately Median
Right-skewedLong tail to the rightMean > Median
Left-skewedLong tail to the leftMean < Median
Always comment on outliers before choosing summary statistics.

Quick Quiz 2

If two datasets have the same mean, but Dataset A has a much larger standard deviation than Dataset B, what does that imply?

Week 3 Takeaways

Center

  • Mean uses all values, sensitive to outliers
  • Median is robust for skewed data
  • Mode helps with common categories

Spread and Shape

  • Range, variance, and SD quantify variability
  • Skewness affects interpretation
  • Never report one summary measure in isolation

Next: apply these ideas to probability distributions and inference.