Week 2 · Statistics for Accounting

Presenting
Data

Turning raw numbers into accounting insight
Running Scenario: MegaMart Retail Audit — expense claims & invoice analysis
2 / 40

Learning Objectives

1
Construct a stem-and-leaf plot by hand and interpret its shape
2
Build a frequency distribution and draw a histogram and ogive
3
Produce and interpret scatter plots and time series plots
4
Select and compare bar charts, pie charts and Pareto diagrams
5
Apply graphical excellence principles and avoid common errors
6
Use multidimensional visualisation to encode more than two variables
3 / 40

The MegaMart Audit

Running Scenario — used for every chart type throughout this lecture

The Client

MegaMart is a national retailer with 12 stores. Your firm has been engaged to audit employee expense claims submitted over the past financial year.

The Audit Question

Are there unusual patterns in claim amounts, categories, or timing that might indicate errors or irregularities?

568
Total claims
20
Sample size
6
Categories

Our Tools Today

  • Stem-and-leaf — see every value
  • Histogram & ogive — see the shape
  • Pareto — find the biggest problems
  • Scatter & time series — spot trends
4 / 40

Raw Data → Information

MegaMart — 20 sampled expense claim amounts ($)
52
67
43
89
71
55
48
93
62
77
58
84
46
69
75
51
88
63
72
56
Step 1

Raw data — individual values, hard to interpret at a glance

Step 2

Organise — sort, group, tabulate

Step 3

Visualise — charts reveal patterns instantly

Step 4

Insight — auditor can draw conclusions

Key question throughout today: What does this data tell us about MegaMart's expense claims — and which chart tells it best?
Section 1 of 6

Stem-and-Leaf Plots

The only chart that shows you every individual value — essential when sample sizes are small.
6 / 40

What is a Stem-and-Leaf Plot?

The Idea

Split each data value into a stem (leading digits) and a leaf (final digit). The result is a display that acts like both a sorted list and a histogram.

Why Auditors Use It

  • Preserves every original value
  • Instantly shows the distribution shape
  • Highlights outliers (unusually high claims)
  • Quick to construct by hand in the field

Anatomy of a Stem-and-Leaf

Example: values 52, 55, 58
Stem
Leaves
5
2 5 8
Stem = tens digit (5 = $50s). Leaf = units digit.
7 / 40

Building the Stem-and-Leaf

MegaMart — 20 expense claim amounts, ordered: $43 to $93

Step 1. Sort values from smallest to largest.
Step 2. Choose stems (tens digits: 4, 5, 6, 7, 8, 9).
Step 3. Write each units digit as a leaf next to its stem.

Ordered Values

43 46 48 51 52 55 56 58
62 63 67 69 71 72 75 77
84 88 89 93
4
3 6 8
5
1 2 5 6 8
6
2 3 7 9
7
1 2 5 7
8
4 8 9
9
3
Stem unit = $10. Leaf unit = $1. Red values = potential outliers for audit investigation.
8 / 40

Reading the Stem-and-Leaf

MegaMart audit — what does the plot tell us?
4
3 6 8
5
1 2 5 6 8
6
2 3 7 9
7
1 2 5 7
8
4 8 9
9
3
Advantage over histogram: all original values are preserved.

Shape

Slight right skew — most claims cluster in the $50–$70 range, with a tail of higher values.

Centre

The median falls in the $60s — around $65.

Audit Flag

The $80–$93 cluster (4 values) stands out. These 4 claims warrant closer review — are they legitimate or inflated?

Quiz 1 of 5
An auditor needs to examine 20 sampled invoice amounts while keeping every individual value visible. Which display is best suited?
A
Histogram
B
Stem-and-leaf plot
C
Pie chart
D
Time series plot
Section 2 of 6

Frequency Distributions & Histograms

Group your data into classes — then visualise the distribution as a bar chart with no gaps.
11 / 40

Building a Frequency Distribution

MegaMart — 20 expense claims, range $43–$93

Step-by-Step

  • 1. Find range: $93 − $43 = $50
  • 2. Choose classes: 6 classes
  • 3. Class width: $50 ÷ 6 ≈ $10
  • 4. Start point: $40 (round down)
  • 5. Tally each value into its class
Rule of thumb: 5–20 classes; equal width; classes must not overlap.
Claim Amount ($)TallyCount
$40 to <$50/// 3
$50 to <$60/////5
$60 to <$70////4
$70 to <$80////4
$80 to <$90///3
$90 to <$100/1
Total20
12 / 40

Frequency & Percentage Distribution

MegaMart — 20 expense claim amounts
Amount ($)Count%Cumulative %
$40–<$50315%15%
$50–<$60525%40%
$60–<$70420%60%
$70–<$80420%80%
$80–<$90315%95%
$90–<$10015%100%
Total20100%

Audit Insight

60% of claims fall between $40 and $70. The 20% of claims above $80 represent a high-value tail worth scrutinising.

Percentage vs Frequency

Percentages (relative frequencies) allow comparison across different sample sizes — useful when comparing MegaMart stores of different sizes.

13 / 40

The Histogram

MegaMart expense claim distribution — frequency histogram
MegaMart — Expense Claim Amounts ($)
0 1 2 3 4 5 3 5 4 4 3 1 $40-50 $50-60 $60-70 $70-80 $80-90 $90-100 Claim Amount ($) Frequency

Key Rules

  • No gaps between bars (continuous data)
  • Bars must be equal width
  • Always label both axes
  • Always include a title

Audit Reading

The distribution is slightly right-skewed. The dark bars ($80+) flag 4 claims for review. A perfectly symmetric distribution would suggest no unusual patterns.

Quiz 2 of 5
Using MegaMart's frequency distribution, what percentage of expense claims fall between $50 and $70?
A
25%
B
40%
C
45%
D
60%
Section 3 of 6

Ogive

The cumulative percentage polygon — answers "what fraction of values fall below a threshold?"
16 / 40

Ogive (Cumulative % Polygon)

MegaMart — cumulative distribution of claim amounts
MegaMart Ogive — Expense Claims
0% 20% 40% 60% 80% 100% 80% $80 $40 $50 $60 $70 $90 $100 Upper Class Boundary ($) Cumulative %

How to Read It

To find "what % of claims are below $X", read up from the x-axis to the curve, then across to the y-axis.

Audit Application

The dashed lines show: 80% of claims are below $80. The top 20% — 4 claims over $80 — are the high-risk zone.

Vs Histogram

The ogive answers threshold questions ("below $70?") that a histogram cannot directly answer.

Section 4 of 6

Scatterplots & Time Series

Two charts that reveal relationships and trends over time — powerful tools for forensic and management accounting.
18 / 40

The Scatterplot

MegaMart — does invoice value predict expense claim size?
Invoice Value vs Expense Claim ($)
40 55 70 85 100 $100 $250 $400 $550 Invoice Value ($) Claim ($)

What a Scatterplot Shows

  • Each dot = one observation (one invoice)
  • X-axis = first variable (invoice value)
  • Y-axis = second variable (claim amount)
  • Pattern = direction and strength of relationship

Audit Insight

A positive relationship exists — larger invoices tend to generate larger claims. The two dark dots are high outliers — claims disproportionately large for their invoice size. Flag for review.

19 / 40

Time Series Plot

MegaMart — monthly expense claim volume (Jan–Dec)
Monthly Expense Claims Submitted
20 30 40 50 60 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Claims

What Makes It "Time Series"?

  • Time is always on the x-axis
  • Points are connected by lines to show trend
  • Data recorded at regular intervals
  • Cannot randomly sample — order matters

Audit Insight

Claim volume is rising sharply in Q4 (Oct–Dec). This could indicate year-end budget flushing — a known fraud risk pattern worth investigating.

Quiz 3 of 5
The MegaMart audit team wants to investigate whether larger invoices tend to produce larger expense claims. Which chart should they use?
A
Bar chart
B
Ogive
C
Scatterplot
D
Stem-and-leaf
Section 5 of 6

Bar, Pie & Pareto Charts

Visualising categorical data — and using the Pareto principle to focus on what matters most.
22 / 40

Bar Chart — Expense Categories

MegaMart — 568 expense claims across 6 categories
Expense Claims by Category
20 40 60 Travel 45 Meals & Ent. 38 Accommodation 22 Office Supplies 18 Training 12 Misc 8 Number of Claims

When to Use a Bar Chart

  • Categorical (not numerical) data
  • Comparing counts or amounts across groups
  • Horizontal bars work better with long category names

Audit Insight

Travel and Meals & Entertainment alone account for 83 claims (58% of all). These two categories are the natural starting point for any expense audit.

23 / 40

Pie Chart — Strengths & Limitations

MegaMart — same data, different chart
Expense Claims by Category
Travel 32% M&E 27% Accom 15% Office 13% Training 8% Misc 5%

When Pie Charts Work

  • Showing part-to-whole relationships
  • Few categories (ideally 3–5)
  • Large differences between slices

Limitations for This Data

  • 6 categories → slices are hard to compare
  • Similar-sized slices (8%, 13%, 15%) are indistinguishable
  • Cannot identify the biggest problem at a glance
Bottom line: The bar chart was clearer for this data. Pie charts often look appealing but communicate less effectively.
24 / 40

The Pareto Chart

MegaMart — identifying the highest-priority audit areas
Pareto Chart — Expense Claims by Category
0 20 40 60 80 0% 29% 58% 73% 100% 80% Travel M&E Accom Office Training Misc Frequency Cumulative %

What a Pareto Chart Does

  • Orders categories largest to smallest
  • Adds a cumulative % line on the right
  • Applies the 80/20 rule

The 80/20 Rule

80% of the total claims come from just 3 categories (Travel, M&E, Accommodation). Auditors should concentrate resources there — not spread effort evenly across all 6.

25 / 40

Choosing the Right Categorical Chart

ChartBest ForWeaknessAccounting Use
Bar chartComparing counts across categoriesDoesn't show prioritiesClaims by department
Pie chartPart-to-whole, few categoriesHard to compare similar slicesRevenue by business unit (2–4 units)
Pareto chartFinding biggest contributors fastMore complex to constructAudit exceptions by type
Grouped barTwo categorical variablesCan become clutteredClaims by dept × quarter
Audit rule of thumb: When you need to prioritise limited audit resources, a Pareto chart almost always outperforms a bar chart or pie chart — it does the prioritisation for you.
Quiz 4 of 5
In MegaMart's expense data, Travel (45) and Meals & Entertainment (38) together account for how much of the 143 total claims?
A
45%
B
58%
C
70%
D
83%
Section 6 of 6

Graphical Excellence

Why some charts illuminate and others deceive — Tufte's principles every accountant should know.
28 / 40

Principles of Graphical Excellence

1
Show the dataThe chart should foreground the data — not the design. Decoration that doesn't carry information is clutter.
2
Avoid distortionThe visual size of shapes should be proportional to the data values they represent.
3
Encourage comparisonThe best charts invite viewers to compare data across categories, time, or groups.
4
Serve a clear purposeKnow whether you are describing, comparing, showing a relationship, or showing change over time — and pick the right chart.
5
Integrate with textA graph should be explainable in one sentence. If you can't explain it, redesign it.

Tufte's Maxim

"The best statistical graphic communicates the largest number of ideas in the shortest time with the least ink."

29 / 40

What to Avoid: Chartjunk & Lie Factor

Chartjunk

Visual decoration that adds no information:

  • 3D effects on bar charts or pie charts
  • Excessive gridlines or background patterns
  • Decorative icons replacing data bars
  • Unnecessary legend boxes for a single series
In practice: Audit report graphs with chartjunk obscure findings and undermine credibility with clients and regulators.

The Lie Factor

When the visual size change ≠ the data size change.

Example: Revenue grew from $10M to $11M (10% increase). A y-axis starting at $9.5M makes this look like a 100% jump.

Solution: Always start y-axis at zero for bar charts.

How Companies Manipulate Charts

Truncating the y-axis (starting at a non-zero value) is the most common way companies visually exaggerate revenue growth — without technically lying.

30 / 40

Which Chart, When?

Use this as your quick reference for selecting the right chart in assessments, reports, and audit work.

Numerical
Distribution of one numerical variable, preserve values
Stem-and-Leaf
Numerical
Distribution of one numerical variable, grouped
Histogram
Numerical
Cumulative distribution, threshold questions
Ogive
Two vars
Relationship between two numerical variables
Scatterplot
Time
Trend of a numerical variable over time
Time Series
Categorical
Counts across categories, easy comparison
Bar Chart
Categorical
Part-to-whole, few categories, large differences
Pie Chart
Categorical
Prioritise categories, apply 80/20 rule
Pareto
Bonus Section

Multidimensional Visualisation

Going beyond two axes — encoding 3, 4, or more variables in a single chart.
32 / 40

Beyond Two Dimensions

Standard charts show two variables (x and y). But real accounting data has many dimensions — here are three ways to encode more.

Colour

A 3rd variable can be encoded as the colour of points, bars, or areas.

Example: Scatterplot with points coloured by department.

Multiple Panels

Repeat the same chart for each level of a categorical variable — called a "small multiples" display.

Example: One histogram per MegaMart store.

Bubble Size

A 4th variable can be encoded as the size of points in a scatterplot — creating a bubble chart.

Example: Bubble size = total claim value.

Rule: Each additional dimension adds cognitive load. Only add dimensions if they reveal a pattern you couldn't show otherwise. Never add dimensions for decoration.
33 / 40

Colour: Adding a 3rd Variable

MegaMart — Invoice value vs Claim amount, coloured by department
3-Variable Scatterplot (X, Y, Colour = Dept)
Invoice Value ($) Claim ($) Sales Operations Finance

What This Shows

Three variables in one chart: invoice value (x), claim amount (y), and department (colour). The Finance department's claims are concentrated in the high-value upper-right zone.

X
Invoice value (numerical)
Y
Claim amount (numerical)
Col
Department (categorical)
34 / 40

Bubble Plot: Four Variables at Once

MegaMart — audit risk summary by store
Audit Risk Bubble Plot (4 Variables)
S1 S2 S3 S4 S5 S6 Avg Invoice Value ($) Days to Approve Low High Fast Slow

Four Variables Encoded

X
Avg invoice value (numerical)
Y
Days to approve claim (numerical)
Size
Total claim value (numerical)
Col
Risk level (categorical)

Audit Reading

Store 3 (large dark bubble, upper right) has high invoice values, slow approvals, and large claim totals — the highest priority for audit investigation.

Quiz 5 of 5
A bubble plot of MegaMart stores shows: x-axis = invoice value, y-axis = days to approve, bubble size = claim total, colour = risk level. How many variables does this chart display?
A
2
B
3
C
4
D
5
36 / 40

Charts in Accounting Careers

Every visualisation we've covered today appears in real accounting and audit work.

ChartAuditManagement AccountingTax & Compliance
Stem-and-leafQuick field review of sampled valuesAnalysing cost distributions
HistogramDistribution of sampled itemsProduction output varianceDistribution of deduction amounts
Ogive"What % of items exceed threshold X?"Cumulative cost reportingPercentile analysis of lodgements
ScatterplotDetecting unusual claim/invoice ratiosCost vs volume analysisSpotting regression anomalies
Time seriesDetecting seasonal fraud patternsMonthly P&L trendsLodgement volume over time
ParetoPrioritising audit exceptionsCost reduction prioritiesTop compliance breach types
37 / 40

Common Graphing Mistakes to Avoid

Structural Errors

  • Gaps between histogram bars (not allowed for continuous data)
  • Unequal class widths without adjustment
  • Y-axis not starting at zero (bar charts)
  • Omitting axis labels or chart title

Chart Choice Errors

  • Pie chart with more than 5 categories
  • Bar chart when a Pareto would prioritise better
  • Time series when data isn't time-ordered

Interpretation Errors

  • Confusing correlation with causation in scatterplots
  • Ignoring outliers instead of investigating them
  • Concluding "no pattern" from a noisy chart without statistics
Exam tip: In assessments, you will be asked to identify errors in charts — and these are the most common ones that appear.
38 / 40

Brain Teasers

Think through these before revealing — they often appear in a different form in assessments.

1. What is the primary advantage of a stem-and-leaf display over a histogram?
The stem-and-leaf preserves every original value while still showing the shape of the distribution. A histogram only shows frequency counts per class — the individual values are lost.
2. How do companies visually exaggerate revenue growth in their annual reports without falsifying numbers?
By truncating the y-axis — starting it at a non-zero value so that even small absolute changes appear as large visual jumps. A 5% revenue increase can be made to look like a doubling.
3. Can you randomly sample a time series? Why or why not?
No. The time element is part of the data — the sequence matters. Randomly sampling a time series would destroy the ordering and make any trend analysis meaningless. You must keep the observations in time order.
4. A Pareto chart shows that 3 categories account for 78% of audit exceptions. A colleague says "let's focus equally on all categories." How would you respond?
The Pareto principle (80/20 rule) tells us to focus effort where it produces the greatest return. Spreading resources equally across all categories ignores the evidence. Concentrating on the top 3 categories will address nearly 80% of the problem with a fraction of the effort.
39 / 40

After This Lecture

Review

  • Re-read lecture material and study guide pages
  • Complete all recommended textbook problems
  • Attempt at least some additional problems

Excel Practice

  • Produce a histogram using Excel (Demo Problem 2.2)
  • Construct a Pareto chart from categorical data
  • Plot a time series and identify any trend

Before Tutorial

Consider: If you were the lead auditor on the MegaMart engagement, which single chart would you present to the audit committee — and why?

Key Formula to Know

Class width = Range ÷ Number of classes

Cumulative % = (Running total ÷ Grand total) × 100
Week 2 Complete

Six tools. One dataset. Infinite insight.

Stem-and-leaf
See every value + shape
Histogram & Ogive
Distribution + thresholds
Scatter + Time Series
Relationships + trends
Bar / Pie / Pareto
Categorical priorities
Graphical Excellence
Clarity over decoration
Multidim Viz
Colour, size, panels