Construct a stem-and-leaf plot by hand and interpret its shape
2
Build a frequency distribution and draw a histogram and ogive
3
Produce and interpret scatter plots and time series plots
4
Select and compare bar charts, pie charts and Pareto diagrams
5
Apply graphical excellence principles and avoid common errors
6
Use multidimensional visualisation to encode more than two variables
3 / 40
The MegaMart Audit
Running Scenario — used for every chart type throughout this lecture
The Client
MegaMart is a national retailer with 12 stores. Your firm has been engaged to audit employee expense claims submitted over the past financial year.
The Audit Question
Are there unusual patterns in claim amounts, categories, or timing that might indicate errors or irregularities?
568
Total claims
20
Sample size
6
Categories
Our Tools Today
Stem-and-leaf — see every value
Histogram & ogive — see the shape
Pareto — find the biggest problems
Scatter & time series — spot trends
4 / 40
Raw Data → Information
MegaMart — 20 sampled expense claim amounts ($)
52
67
43
89
71
55
48
93
62
77
58
84
46
69
75
51
88
63
72
56
Step 1
Raw data — individual values, hard to interpret at a glance
Step 2
Organise — sort, group, tabulate
Step 3
Visualise — charts reveal patterns instantly
Step 4
Insight — auditor can draw conclusions
Key question throughout today: What does this data tell us about MegaMart's expense claims — and which chart tells it best?
Section 1 of 6
Stem-and-Leaf Plots
The only chart that shows you every individual value — essential when sample sizes are small.
6 / 40
What is a Stem-and-Leaf Plot?
The Idea
Split each data value into a stem (leading digits) and a leaf (final digit). The result is a display that acts like both a sorted list and a histogram.
Why Auditors Use It
Preserves every original value
Instantly shows the distribution shape
Highlights outliers (unusually high claims)
Quick to construct by hand in the field
Anatomy of a Stem-and-Leaf
Example: values 52, 55, 58
Stem
Leaves
5
2 5 8
Stem = tens digit (5 = $50s). Leaf = units digit.
7 / 40
Building the Stem-and-Leaf
MegaMart — 20 expense claim amounts, ordered: $43 to $93
Step 1. Sort values from smallest to largest. Step 2. Choose stems (tens digits: 4, 5, 6, 7, 8, 9). Step 3. Write each units digit as a leaf next to its stem.
Stem unit = $10. Leaf unit = $1. Red values = potential outliers for audit investigation.
8 / 40
Reading the Stem-and-Leaf
MegaMart audit — what does the plot tell us?
4
3 6 8
5
1 2 5 6 8
6
2 3 7 9
7
1 2 5 7
8
4 8 9
9
3
Advantage over histogram: all original values are preserved.
Shape
Slight right skew — most claims cluster in the $50–$70 range, with a tail of higher values.
Centre
The median falls in the $60s — around $65.
Audit Flag
The $80–$93 cluster (4 values) stands out. These 4 claims warrant closer review — are they legitimate or inflated?
Quiz 1 of 5
An auditor needs to examine 20 sampled invoice amounts while keeping every individual value visible. Which display is best suited?
A
Histogram
B
Stem-and-leaf plot
C
Pie chart
D
Time series plot
Section 2 of 6
Frequency Distributions & Histograms
Group your data into classes — then visualise the distribution as a bar chart with no gaps.
11 / 40
Building a Frequency Distribution
MegaMart — 20 expense claims, range $43–$93
Step-by-Step
1. Find range: $93 − $43 = $50
2. Choose classes: 6 classes
3. Class width: $50 ÷ 6 ≈ $10
4. Start point: $40 (round down)
5. Tally each value into its class
Rule of thumb: 5–20 classes; equal width; classes must not overlap.
Claim Amount ($)
Tally
Count
$40 to <$50
///
3
$50 to <$60
/////
5
$60 to <$70
////
4
$70 to <$80
////
4
$80 to <$90
///
3
$90 to <$100
/
1
Total
20
12 / 40
Frequency & Percentage Distribution
MegaMart — 20 expense claim amounts
Amount ($)
Count
%
Cumulative %
$40–<$50
3
15%
15%
$50–<$60
5
25%
40%
$60–<$70
4
20%
60%
$70–<$80
4
20%
80%
$80–<$90
3
15%
95%
$90–<$100
1
5%
100%
Total
20
100%
Audit Insight
60% of claims fall between $40 and $70. The 20% of claims above $80 represent a high-value tail worth scrutinising.
Percentage vs Frequency
Percentages (relative frequencies) allow comparison across different sample sizes — useful when comparing MegaMart stores of different sizes.
13 / 40
The Histogram
MegaMart expense claim distribution — frequency histogram
MegaMart — Expense Claim Amounts ($)
Key Rules
No gaps between bars (continuous data)
Bars must be equal width
Always label both axes
Always include a title
Audit Reading
The distribution is slightly right-skewed. The dark bars ($80+) flag 4 claims for review. A perfectly symmetric distribution would suggest no unusual patterns.
Quiz 2 of 5
Using MegaMart's frequency distribution, what percentage of expense claims fall between $50 and $70?
A
25%
B
40%
C
45%
D
60%
Section 3 of 6
Ogive
The cumulative percentage polygon — answers "what fraction of values fall below a threshold?"
16 / 40
Ogive (Cumulative % Polygon)
MegaMart — cumulative distribution of claim amounts
MegaMart Ogive — Expense Claims
How to Read It
To find "what % of claims are below $X", read up from the x-axis to the curve, then across to the y-axis.
Audit Application
The dashed lines show: 80% of claims are below $80. The top 20% — 4 claims over $80 — are the high-risk zone.
Vs Histogram
The ogive answers threshold questions ("below $70?") that a histogram cannot directly answer.
Section 4 of 6
Scatterplots & Time Series
Two charts that reveal relationships and trends over time — powerful tools for forensic and management accounting.
18 / 40
The Scatterplot
MegaMart — does invoice value predict expense claim size?
Invoice Value vs Expense Claim ($)
What a Scatterplot Shows
Each dot = one observation (one invoice)
X-axis = first variable (invoice value)
Y-axis = second variable (claim amount)
Pattern = direction and strength of relationship
Audit Insight
A positive relationship exists — larger invoices tend to generate larger claims. The two dark dots are high outliers — claims disproportionately large for their invoice size. Flag for review.
19 / 40
Time Series Plot
MegaMart — monthly expense claim volume (Jan–Dec)
Monthly Expense Claims Submitted
What Makes It "Time Series"?
Time is always on the x-axis
Points are connected by lines to show trend
Data recorded at regular intervals
Cannot randomly sample — order matters
Audit Insight
Claim volume is rising sharply in Q4 (Oct–Dec). This could indicate year-end budget flushing — a known fraud risk pattern worth investigating.
Quiz 3 of 5
The MegaMart audit team wants to investigate whether larger invoices tend to produce larger expense claims. Which chart should they use?
A
Bar chart
B
Ogive
C
Scatterplot
D
Stem-and-leaf
Section 5 of 6
Bar, Pie & Pareto Charts
Visualising categorical data — and using the Pareto principle to focus on what matters most.
22 / 40
Bar Chart — Expense Categories
MegaMart — 568 expense claims across 6 categories
Expense Claims by Category
When to Use a Bar Chart
Categorical (not numerical) data
Comparing counts or amounts across groups
Horizontal bars work better with long category names
Audit Insight
Travel and Meals & Entertainment alone account for 83 claims (58% of all). These two categories are the natural starting point for any expense audit.
23 / 40
Pie Chart — Strengths & Limitations
MegaMart — same data, different chart
Expense Claims by Category
When Pie Charts Work
Showing part-to-whole relationships
Few categories (ideally 3–5)
Large differences between slices
Limitations for This Data
6 categories → slices are hard to compare
Similar-sized slices (8%, 13%, 15%) are indistinguishable
Cannot identify the biggest problem at a glance
Bottom line: The bar chart was clearer for this data. Pie charts often look appealing but communicate less effectively.
24 / 40
The Pareto Chart
MegaMart — identifying the highest-priority audit areas
Pareto Chart — Expense Claims by Category
What a Pareto Chart Does
Orders categories largest to smallest
Adds a cumulative % line on the right
Applies the 80/20 rule
The 80/20 Rule
80% of the total claims come from just 3 categories (Travel, M&E, Accommodation). Auditors should concentrate resources there — not spread effort evenly across all 6.
25 / 40
Choosing the Right Categorical Chart
Chart
Best For
Weakness
Accounting Use
Bar chart
Comparing counts across categories
Doesn't show priorities
Claims by department
Pie chart
Part-to-whole, few categories
Hard to compare similar slices
Revenue by business unit (2–4 units)
Pareto chart
Finding biggest contributors fast
More complex to construct
Audit exceptions by type
Grouped bar
Two categorical variables
Can become cluttered
Claims by dept × quarter
Audit rule of thumb: When you need to prioritise limited audit resources, a Pareto chart almost always outperforms a bar chart or pie chart — it does the prioritisation for you.
Quiz 4 of 5
In MegaMart's expense data, Travel (45) and Meals & Entertainment (38) together account for how much of the 143 total claims?
A
45%
B
58%
C
70%
D
83%
Section 6 of 6
Graphical Excellence
Why some charts illuminate and others deceive — Tufte's principles every accountant should know.
28 / 40
Principles of Graphical Excellence
1
Show the dataThe chart should foreground the data — not the design. Decoration that doesn't carry information is clutter.
2
Avoid distortionThe visual size of shapes should be proportional to the data values they represent.
3
Encourage comparisonThe best charts invite viewers to compare data across categories, time, or groups.
4
Serve a clear purposeKnow whether you are describing, comparing, showing a relationship, or showing change over time — and pick the right chart.
5
Integrate with textA graph should be explainable in one sentence. If you can't explain it, redesign it.
Tufte's Maxim
"The best statistical graphic communicates the largest number of ideas in the shortest time with the least ink."
29 / 40
What to Avoid: Chartjunk & Lie Factor
Chartjunk
Visual decoration that adds no information:
3D effects on bar charts or pie charts
Excessive gridlines or background patterns
Decorative icons replacing data bars
Unnecessary legend boxes for a single series
In practice: Audit report graphs with chartjunk obscure findings and undermine credibility with clients and regulators.
The Lie Factor
When the visual size change ≠ the data size change.
Example: Revenue grew from $10M to $11M (10% increase). A y-axis starting at $9.5M makes this look like a 100% jump.
Solution: Always start y-axis at zero for bar charts.
How Companies Manipulate Charts
Truncating the y-axis (starting at a non-zero value) is the most common way companies visually exaggerate revenue growth — without technically lying.
30 / 40
Which Chart, When?
Use this as your quick reference for selecting the right chart in assessments, reports, and audit work.
Numerical
Distribution of one numerical variable, preserve values
Stem-and-Leaf
Numerical
Distribution of one numerical variable, grouped
Histogram
Numerical
Cumulative distribution, threshold questions
Ogive
Two vars
Relationship between two numerical variables
Scatterplot
Time
Trend of a numerical variable over time
Time Series
Categorical
Counts across categories, easy comparison
Bar Chart
Categorical
Part-to-whole, few categories, large differences
Pie Chart
Categorical
Prioritise categories, apply 80/20 rule
Pareto
Bonus Section
Multidimensional Visualisation
Going beyond two axes — encoding 3, 4, or more variables in a single chart.
32 / 40
Beyond Two Dimensions
Standard charts show two variables (x and y). But real accounting data has many dimensions — here are three ways to encode more.
Colour
A 3rd variable can be encoded as the colour of points, bars, or areas.
Example: Scatterplot with points coloured by department.
Multiple Panels
Repeat the same chart for each level of a categorical variable — called a "small multiples" display.
Example: One histogram per MegaMart store.
Bubble Size
A 4th variable can be encoded as the size of points in a scatterplot — creating a bubble chart.
Example: Bubble size = total claim value.
Rule: Each additional dimension adds cognitive load. Only add dimensions if they reveal a pattern you couldn't show otherwise. Never add dimensions for decoration.
33 / 40
Colour: Adding a 3rd Variable
MegaMart — Invoice value vs Claim amount, coloured by department
3-Variable Scatterplot (X, Y, Colour = Dept)
What This Shows
Three variables in one chart: invoice value (x), claim amount (y), and department (colour). The Finance department's claims are concentrated in the high-value upper-right zone.
X
Invoice value (numerical)
Y
Claim amount (numerical)
Col
Department (categorical)
34 / 40
Bubble Plot: Four Variables at Once
MegaMart — audit risk summary by store
Audit Risk Bubble Plot (4 Variables)
Four Variables Encoded
X
Avg invoice value (numerical)
Y
Days to approve claim (numerical)
Size
Total claim value (numerical)
Col
Risk level (categorical)
Audit Reading
Store 3 (large dark bubble, upper right) has high invoice values, slow approvals, and large claim totals — the highest priority for audit investigation.
Quiz 5 of 5
A bubble plot of MegaMart stores shows: x-axis = invoice value, y-axis = days to approve, bubble size = claim total, colour = risk level. How many variables does this chart display?
A
2
B
3
C
4
D
5
36 / 40
Charts in Accounting Careers
Every visualisation we've covered today appears in real accounting and audit work.
Chart
Audit
Management Accounting
Tax & Compliance
Stem-and-leaf
Quick field review of sampled values
Analysing cost distributions
—
Histogram
Distribution of sampled items
Production output variance
Distribution of deduction amounts
Ogive
"What % of items exceed threshold X?"
Cumulative cost reporting
Percentile analysis of lodgements
Scatterplot
Detecting unusual claim/invoice ratios
Cost vs volume analysis
Spotting regression anomalies
Time series
Detecting seasonal fraud patterns
Monthly P&L trends
Lodgement volume over time
Pareto
Prioritising audit exceptions
Cost reduction priorities
Top compliance breach types
37 / 40
Common Graphing Mistakes to Avoid
Structural Errors
Gaps between histogram bars (not allowed for continuous data)
Unequal class widths without adjustment
Y-axis not starting at zero (bar charts)
Omitting axis labels or chart title
Chart Choice Errors
Pie chart with more than 5 categories
Bar chart when a Pareto would prioritise better
Time series when data isn't time-ordered
Interpretation Errors
Confusing correlation with causation in scatterplots
Ignoring outliers instead of investigating them
Concluding "no pattern" from a noisy chart without statistics
Exam tip: In assessments, you will be asked to identify errors in charts — and these are the most common ones that appear.
38 / 40
Brain Teasers
Think through these before revealing — they often appear in a different form in assessments.
1. What is the primary advantage of a stem-and-leaf display over a histogram?
The stem-and-leaf preserves every original value while still showing the shape of the distribution. A histogram only shows frequency counts per class — the individual values are lost.
2. How do companies visually exaggerate revenue growth in their annual reports without falsifying numbers?
By truncating the y-axis — starting it at a non-zero value so that even small absolute changes appear as large visual jumps. A 5% revenue increase can be made to look like a doubling.
3. Can you randomly sample a time series? Why or why not?
No. The time element is part of the data — the sequence matters. Randomly sampling a time series would destroy the ordering and make any trend analysis meaningless. You must keep the observations in time order.
4. A Pareto chart shows that 3 categories account for 78% of audit exceptions. A colleague says "let's focus equally on all categories." How would you respond?
The Pareto principle (80/20 rule) tells us to focus effort where it produces the greatest return. Spreading resources equally across all categories ignores the evidence. Concentrating on the top 3 categories will address nearly 80% of the problem with a fraction of the effort.
39 / 40
After This Lecture
Review
Re-read lecture material and study guide pages
Complete all recommended textbook problems
Attempt at least some additional problems
Excel Practice
Produce a histogram using Excel (Demo Problem 2.2)
Construct a Pareto chart from categorical data
Plot a time series and identify any trend
Before Tutorial
Consider: If you were the lead auditor on the MegaMart engagement, which single chart would you present to the audit committee — and why?
Key Formula to Know
Class width = Range ÷ Number of classes
Cumulative % = (Running total ÷ Grand total) × 100