Week 2 · Tutorial

Presenting Data

Guided activities — from raw numbers to meaningful charts
✏️ By hand
📊 Excel
7 activities
60 min
2 / 28

Today's Tutorial Roadmap

✏️ By Hand
Stem-and-Leaf
Activity 2 · PP2.9 · PP2.14
✏️ By Hand
Frequency Distribution
Activity 1 · Activity 3 · RP2.8 · PP2.4
📊 Excel
Histogram & Ogive
Activity 4 · RP2.9
📊 Excel
Scatter Plot
PP2.16
📊 Excel
Bar & Pie Charts
PP2.11 · PP2.10
✏️ Both
Pareto Chart
Activity 6
Running Thread
Factory Workers
Same dataset → 3 different charts
Key Skill
Chart Selection
Right chart for right question
Running dataset: Activities 2, 3 and 4 all use the same 29 factory workers' commute times — so you'll see how the same data looks through three different lenses.
Section 1 of 5

Stem-and-Leaf Plots

See every individual value and the distribution shape at the same time.
Activity 2 — Factory Workers
PP2.9 — Two-digit stems
PP2.14 — Decimal stems
4 / 28
✏️ Activity 2 — By Hand

Factory Workers: Commute Times (min)

A sample of 29 factory workers were asked how long they commute from home to work on average.

50
30
73
79
5
9
42
10
15
35
51
68
69
32
46
54
65
44
23
36
77
61
59
56
48
58
21
11
46

Task: Construct a stem-and-leaf plot of this data.

Step 1 — Sort

Order the values from smallest to largest before you begin.

Step 2 — Choose stems

Values range from 5 to 79. Use the tens digit (0, 1, 2, … 7) as stems.

Step 3 — Add leaves

The units digit of each value becomes the leaf next to its stem.

Sorted values: 5, 9, 10, 11, 15, 21, 23, 30, 32, 35, 36, 42, 44, 46, 46, 48, 50, 51, 54, 56, 58, 59, 61, 65, 68, 69, 73, 77, 79
0
5 9
1
0 1 5
2
1 3
3
0 2 5 6
4
2 4 6 6 8
5
0 1 4 6 8 9
6
1 5 8 9
7
3 7 9

Interpretation

  • Most workers commute 40–60 minutes
  • Distribution is roughly symmetric
  • Note the repeat value: two workers commute exactly 46 minutes
  • No extreme outliers
5 / 28

Extension: Different Stem Choices

✏️ PP2.9 — Two-digit stems

The challenge

When data values are large (e.g. 394, 388, 392…), the standard single-digit stem produces too few rows. Use the first two digits as the stem.

Rule

Round to the appropriate number of significant figures first, then split: all digits except the last = stem, last digit = leaf.

✏️ PP2.14 — Decimal stems

Monthly downloads (MB) — decimal data

When data has a decimal component, use the whole number as the stem and the decimal digit as the leaf.

Example logic

Value = 3.7 → Stem = 3, Leaf = 7
Value = 12.4 → Stem = 12, Leaf = 4

Key question for PP2.14: After constructing the plot, what does it tell you about typical monthly downloads and the spread of values?
Section 2 of 5

Frequency Distributions

Group continuous data into classes — then calculate frequencies, relative frequencies, and cumulative percentages.
Activity 3 — Factory Workers (same data!)
Activity 1 — GDP Classes
RP2.8 — Metal Rings
PP2.4 — Absenteeism
7 / 28
✏️ Activity 3 — By Hand

Same Data → Frequency Distribution

Same 29 workers as Activity 2! Now we group the data instead of preserving individual values. Use "0 to less than 10" as the first class.
Class width = Range ÷ Classes

Your Steps

  1. 1
    Find range: Max − Min = ?
  2. 2
    Decide on number of classes (hint: 8 works well)
  3. 3
    Confirm class width = 10 (given by the first class 0–<10)
  4. 4
    Tally each of the 29 values into its class
  5. 5
    Check: all frequencies must sum to 29
Count per class:
0–<10: 2 values (5, 9)
10–<20: 3 values (10, 11, 15)
20–<30: 2 values (21, 23)
30–<40: 4 values (30, 32, 35, 36)
Commute Time (min)Count%Cum %
0 to <1026.9%6.9%
10 to <20310.3%17.2%
20 to <3026.9%24.1%
30 to <40413.8%37.9%
40 to <50517.2%55.2%
50 to <60620.7%75.9%
60 to <70413.8%89.7%
70 to <80310.3%100%
Total29100%
8 / 28
✏️ RP2.8 — By Hand

Metal Rings: Five-Class Frequency Distribution

A manufacturer samples 50 metal rings (target weight ~1.4 kg). Construct a frequency distribution using FIVE classes.

1.51
1.59
1.42
1.25
1.34
1.51
1.51
1.19
1.62
1.31
1.56
1.16
1.25
1.48
1.59
1.42
1.62
1.25
1.31
1.16
1.48
1.96
1.51
1.62
1.45
1.53
1.79
1.19
1.34
1.34
1.48
1.51
1.31
1.02
1.65
1.45
1.08
1.39
1.42
1.76
1.11
1.25
1.56
1.22
1.48
1.22
1.19
1.62
1.39
1.45

Your working

  • Min = 1.02, Max = 1.96
  • Range = 1.96 − 1.02 = 0.94
  • Width = 0.94 ÷ 5 ≈ 0.19 → round to 0.20
  • Start point: 1.00 (round down)
Weight (kg)CountObservation
1.00–<1.206Underweight
1.20–<1.4014Below target
1.40–<1.6022Near target ✓
1.60–<1.806Above target
1.80–<2.002Overweight
Total50
9 / 28
✏️ PP2.4 — By Hand

Absenteeism: Midpoints, Relative & Cumulative Frequency

500 employee records examined. For each class: determine the class midpoint, relative frequency, and cumulative frequency.

Class intervalFrequencyMidpointRel. Freq.Cum. Freq.
(0, 2]218???
(2, 4]207???
(4, 6]56???
(6, 8]11???
(8, 10]8???
Total5001.000500

Formulas to use

Midpoint = (Lower + Upper) ÷ 2
Relative Freq = Class Freq ÷ Total
Cumulative Freq = Running total
ClassFreqMidptRel. Freq.Cum. Freq.
(0, 2]21810.436218
(2, 4]20730.414425
(4, 6]5650.112481
(6, 8]1170.022492
(8, 10]890.016500
5001.000500

Key insight

85% of employees (425 out of 500) took 4 or fewer days off. The distribution is strongly right-skewed — most people have low absenteeism, but a few have much higher.

Section 3 of 5

Histogram, Frequency Polygon & Ogive

Turn your frequency table into a chart — by hand first, then in Excel.
Activity 4 — Factory workers histogram
RP2.9 — NZ house prices
11 / 28
✏️ + 📊 Activity 4 — By Hand, then Excel

Factory Workers: Draw a Histogram

Use your frequency distribution from Activity 3 (same 29 workers' commute times). First class: 0 to less than 10.

By Hand — Rules to Remember

  • Bars must touch — no gaps (continuous data)
  • Bar height = frequency (or relative frequency)
  • All bars must be equal width
  • Label both axes with units
  • Give the chart a title

📊 Excel Steps

  1. 1
    Enter the raw data in Column A
  2. 2
    Insert → Charts → Histogram
  3. 3
    Right-click x-axis → Format Axis → Set bin width to 10
  4. 4
    Add axis labels and a title
Expected Shape — Commute Times
2 3 2 4 5 6 4 3 0-10 20-30 40-50 60-70
Discussion: Compare your by-hand histogram with the stem-and-leaf from Activity 2. What do both show about the commute time distribution? What does one show that the other doesn't?
12 / 28
📊 RP2.9 — Excel Recommended

NZ House Prices: Frequency Polygon & Ogive

90 houses for sale in a NZ city, each ~180m². Use the frequency distribution provided to construct both charts.

Frequency Polygon — Key Rules

  • Plot frequency against class midpoint (not boundary)
  • Connect points with straight lines
  • Close the polygon at both ends by adding hypothetical classes with zero frequency

Ogive — Key Rules

  • Plot cumulative % against the upper class boundary
  • Starts at 0% (at lower boundary of first class)
  • Ends at 100% (at upper boundary of last class)
  • Use for answering "what % are below $X?"

📊 Excel Tip — Ogive

  1. 1
    Build frequency table; add Cumulative % column
  2. 2
    Select upper boundaries + Cum % columns
  3. 3
    Insert → Charts → Line with Markers
  4. 4
    Format axes, add labels, title
Discussion question: Using your ogive, estimate what percentage of houses are priced below $300,000. Then estimate the median house price.
Section 4 of 5

Scatter Plot

Investigating relationships between two numerical variables.
PP2.16 — Advertising vs Sales Revenue
14 / 28
📊 PP2.16 — Excel Recommended

Advertising Spend vs Sales Revenue

Is total sales revenue related to advertising spend? Construct a scatter plot and discuss the relationship.

Advertising ($M)Sales Revenue ($M)
4.2155.7
1.687.3
6.3135.6
2.799.0
10.4168.2
7.1136.9
5.5101.4
8.3158.2

📊 Excel Steps

  1. 1
    Put Advertising in Col A, Sales in Col B
  2. 2
    Select both columns → Insert → Scatter (X,Y)
  3. 3
    Add axis titles and a chart title
  4. 4
    Right-click a point → Add Trendline → Linear
Advertising vs Sales — Expected Pattern
$1M $10M Advertising ($M) Sales ($M)

Relationship

There is a positive relationship between advertising spend and sales — higher advertising tends to produce higher revenue. However, the relationship is not perfectly linear (e.g. $6.3M spend produces less sales than $5.5M). We'll quantify this relationship in Week 10 (correlation & regression).

Section 5 of 5

Bar, Pie & Pareto Charts

Visualising categorical data — and using the 80/20 rule to find what matters most.
PP2.11 — Australia Imports (Bar)
PP2.10 — Australia Exports (Pie)
Activity 6 — Restaurant Complaints (Pareto)
16 / 28
📊 PP2.11 — Excel

Australia's Top 10 Import Sources

Import SourceA$ million
China49,329
USA39,181
Japan21,221
Singapore17,878
Thailand13,832
Germany13,099
UK12,044
Malaysia10,944
South Korea10,813
New Zealand10,532

📊 Excel Steps

  1. 1
    Enter Country in Col A, Value in Col B
  2. 2
    Sort by value descending (Data → Sort)
  3. 3
    Select both columns → Insert → Bar Chart (horizontal)
  4. 4
    Label axes and add a title

Horizontal or Vertical?

With long country names, a horizontal bar chart (country names on y-axis) is easier to read than a vertical one. Excel calls this a "Bar Chart" rather than "Column Chart".

Discussion: After building this chart alongside the exports chart (PP2.10), which countries are Australia's most important two-way trading partners?
17 / 28
📊 PP2.10 — Excel

Australia's Top 10 Export Destinations

Export MarketA$ million% (approx)
China94,65545.1%
Japan47,50122.6%
South Korea19,6109.3%
USA9,5804.6%
India9,5174.5%
New Zealand7,3993.5%
Taiwan7,3563.5%
Singapore5,6592.7%
Malaysia5,5612.6%
UK3,8591.8%
Total210,697100%

📊 Excel Steps

  1. 1
    Insert → Charts → Pie → 2-D Pie
  2. 2
    Right-click slices → Add Data Labels → Show %
  3. 3
    Give the chart a meaningful title

Is a pie chart effective here?

  • China dominates at 45% — this is obvious in any chart
  • But comparing the 8 small slices (2–9% each) is very difficult in a pie
  • A bar chart sorted by value would make comparisons clearer
  • Pie charts work best with 3–5 categories, not 10
18 / 28
✏️ + 📊 Activity 6 — By Hand + Excel

Restaurant Complaints: Build a Pareto Chart

A restaurant owner reviews complaint forms from the past year. Construct a Pareto chart to identify the most important complaints to fix first.

Complaint TypePercentage (%)
Long waiting time9%
Food too salty6%
Overpriced26%
Small portions21%
Food not fresh16%
No atmosphere12%
Unfriendly staff7%
Other3%

Step 1 — Sort first!

Before drawing anything, re-order categories from largest to smallest percentage. This is the defining feature of a Pareto chart.

By Hand — What to Draw

  1. 1
    Sort categories largest → smallest
  2. 2
    Draw bars (sorted) on left y-axis (%)
  3. 3
    Calculate running cumulative %
  4. 4
    Plot cumulative % line on right y-axis
  5. 5
    Draw a reference line at 80%
Complaint%Cum %
Overpriced2626
Small portions2147
Food not fresh1663
No atmosphere1275
Long waiting time984
Unfriendly staff791
Food too salty697
Other3100
19 / 28

Activity 6: Pareto Chart & Interpretation

Restaurant Complaints — Pareto Chart
0% 25% 50% 75% 100% 0% 25% 50% 75% 100% 80% Overpriced Sm.Portions Not Fresh No Atmosph. Long Wait Unfriendly Too Salty Other

The 80/20 Reading

The cumulative line crosses 80% after the 5th bar. So 5 complaint types account for 84% of all complaints:

  1. Overpriced (26%)
  2. Small portions (21%)
  3. Food not fresh (16%)
  4. No atmosphere (12%)
  5. Long waiting time (9%)
Management implication: The owner should focus on pricing and portion size first — together they account for 47% of all complaints. Fixing salty food or rudeness won't move the needle much.
20 / 28
📊 Activity 6 — Excel Version

Building the Pareto Chart in Excel

Method 1: Excel Built-in (Office 2016+)

  1. 1
    Enter complaint names in Col A, percentages in Col B (unsorted is fine)
  2. 2
    Select the data → Insert → Statistical Chart → Pareto
  3. 3
    Excel automatically sorts and adds the cumulative line
  4. 4
    Format: add title, label axes

Method 2: Manual (any version)

  1. 1
    Sort data descending by frequency
  2. 2
    Add a Cumulative % column
  3. 3
    Create a Column Chart from sorted frequencies
  4. 4
    Add Cum% as a second series → Change type to Line → Secondary axis

Common Excel Mistakes

  • Forgetting to set secondary y-axis for the cumulative line
  • Not matching the scale: 100% on left must align with 100% on right
  • Leaving gaps between bars (change series options: Gap Width = 0)
  • Forgetting to sort the categories first (in Method 2)
Exam tip: In a written exam you may be asked to interpret a Pareto chart rather than draw one. Always read: (1) what are the top 1–2 categories? (2) Where does the 80% line intersect?
21 / 28

Discussion: Bar vs Pie vs Pareto

Use the restaurant data as a test case. Which chart best answers each question?

QuestionBest ChartWhy?
"What % of complaints are about price?"Bar or ParetoEasy to read one bar's value
"Which 3 complaints should we fix first?"ParetoAlready sorted, cumulative line shows priority
"What fraction of ALL complaints is food-related?"PiePart-to-whole with grouped categories works
"How do complaints compare across 3 restaurant branches?"Grouped barSide-by-side comparison across groups
Key principle: The Pareto chart outperforms a plain bar chart whenever the goal is to prioritise — it does the sorting for you and shows the cumulative impact.

The 80/20 Rule (Pareto Principle)

Roughly 80% of effects come from 20% of causes. In auditing: 80% of errors come from 20% of account types. In quality control: 80% of defects come from 20% of processes.

22 / 28

Reflection: Same Data, Three Different Lenses

Activities 2, 3 and 4 all used the same 29 workers' commute times. Each chart gave you different information.
ChartActivityWhat it showsWhat it hides
Stem-and-leafActivity 2Every individual value, distribution shape, exact duplicatesLarge datasets become unwieldy
Frequency tableActivity 3Count and % per class, cumulative %Individual values within each class
HistogramActivity 4Visual distribution shape, skewness at a glanceIndividual values, exact counts require axis reading

Which would you choose for an audit report?

The histogram is best for the report — it's visual and immediately communicates shape and spread. The stem-and-leaf is best for field work where you need to inspect individual values quickly. The frequency table is the source of truth behind both.

23 / 28

📊 Excel Quick Reference

Summary of all Excel chart types covered today — use this as a checklist when completing exercises at home.

Chart NeededExcel Insert PathKey Setting to Change
HistogramInsert → Statistical Charts → HistogramRight-click x-axis → Format Axis → Bin width
Frequency PolygonInsert → Charts → Line with Markersx-axis values = class midpoints
OgiveInsert → Charts → Line with Markersx-axis = upper boundaries, y = cumulative %
Scatter plotInsert → Charts → Scatter (X,Y)Right-click → Add Trendline → Linear
Bar chartInsert → Charts → Bar (horizontal)Sort data descending before charting
Pie chartInsert → Charts → Pie → 2D PieAdd Data Labels → Show Percentage
Pareto chartInsert → Statistical Charts → ParetoAuto-sorted; check secondary axis alignment
Always: Add a descriptive title, label both axes with variable names and units, and check that your chart answers the question you're trying to answer before submitting.
24 / 28

Common Errors to Avoid

Stem-and-Leaf Errors

  • Not sorting the leaves within each row
  • Using more than one digit as a leaf
  • Forgetting stems that have no leaves (must still appear)
  • Using commas between leaves (not allowed)

Frequency Distribution Errors

  • Overlapping class boundaries (e.g. 0–10 and 10–20 both include 10)
  • Unequal class widths (unless explicitly required)
  • Frequencies that don't sum to n
  • Choosing too many or too few classes

Histogram Errors

  • Gaps between bars (not allowed — continuous data)
  • Y-axis not starting at zero
  • Missing axis labels or chart title

Pareto Chart Errors

  • Bars not sorted largest → smallest
  • Cumulative line not matching 100% at the right
  • Treating a Pareto as just a bar chart (forgetting the cumulative line)
25 / 28

Brain Teasers

Think through these before clicking reveal — they test deeper understanding beyond procedure.

1. A student builds a stem-and-leaf display and then a histogram from the same data. They notice the histogram looks symmetric but the stem-and-leaf shows a cluster of repeated values at the low end. How is this possible?
The histogram grouped values into classes, which can mask patterns within a class. Repeated values at the low end might all fall in the same class, making that bar's height look normal. The stem-and-leaf preserves individual values and reveals the clustering. This is why we always check the stem-and-leaf before finalising a histogram.
2. An accountant's Pareto chart of expense exceptions shows the top 2 categories hitting 79% — just under 80%. Their manager says "the 80/20 rule doesn't apply here." Are they right?
No. The "80/20 rule" is a rough guideline, not an exact threshold. The point is that a small number of categories drive most of the impact. At 79%, the principle still clearly applies — focusing on those 2 categories addresses the vast majority of exceptions. The exact percentage matters less than the insight.
3. Why might an ogive be more useful than a histogram for answering "what percentage of employees take more than 5 days off?"
A histogram shows frequency per class — to answer a threshold question, you'd need to add up multiple bars and estimate within-class proportions. An ogive directly shows the cumulative percentage at any boundary point. You simply read off "% below 5 days" directly, then subtract from 100% to get "more than 5 days". It's purpose-built for this type of question.
26 / 28

Quick Challenge: Pick the Right Chart

For each accounting scenario below, identify the best chart type and briefly explain why. Discuss with your group.

#ScenarioBest Chart
1You have 25 sampled invoice amounts and want to check for outliers while preserving every value.Your answer →
2You want to show what % of a client's costs are below $10,000.Your answer →
3You want to show the monthly pattern of late payments over a full year.Your answer →
4You need to identify which 3 audit exception types account for most of the problem.Your answer →
5You want to know if advertising spend predicts sales for 8 industry companies.Your answer →
6You have 4 business units and want to show their share of total revenue.Your answer →

1

Stem-and-leaf — preserves individual values

2

Ogive — designed for cumulative % threshold questions

3

Time series — time on x-axis, shows trend

4

Pareto — sorted with cumulative line, 80/20 rule

5

Scatter plot — relationship between two numerical variables

6

Pie chart — 4 categories, part-to-whole, manageable
27 / 28

Assessment Tips for Week 2 Topics

In Written Exams

  • Always show your class width calculation
  • Leave no stems out of a stem-and-leaf (even empty ones)
  • Add axis labels, units, and a title to every chart
  • For pie charts: state why it is or isn't effective for that data
  • For Pareto: identify how many categories reach 80%

When Interpreting Charts

  • Always state the shape (symmetric / skewed?)
  • Identify any outliers or unusual values
  • Give an accounting implication (not just a description)

In Excel Assignments

  • Set histogram bin width to match your class width exactly
  • For scatter: include a trendline to show direction of relationship
  • For Pareto: check secondary y-axis aligns 100% left = 100% right
  • Never leave default chart titles like "Chart 1"
The key question for any chart: Does this graph answer the question I'm trying to answer? If not, choose a different chart.
Week 2 Tutorial Complete

Before Next Week

Review
Re-read lecture slides for any chart type that wasn't clear today
Practice
Complete RP2.8 and PP2.16 in Excel if you didn't finish in class
Excel
Build a Pareto chart from any dataset of your choice — try it both ways (built-in and manual)
Think
Next week: measures of centre and spread — mean, median, standard deviation