Statistics for Accounting — Week 4

Introduction
to Probability

Sample spaces, conditional probability, Bayes’ theorem, and counting rules
3 Excel exercises with step-by-step solutions included

By the end of this week, you will be able to…

Demonstrate understanding of basic probability concepts including sample spaces, events, contingency tables, marginal and joint probability
Use and compute conditional probabilities from contingency tables and probability matrices
Apply the addition and multiplication rules to find union and intersection probabilities
Explain and demonstrate statistical independence and mutual exclusivity
Apply Bayes’ theorem to update probabilities based on new evidence
Use counting rules—permutations and combinations—to count sample space outcomes
Section 01

Probability Foundations

Sample spaces, events, and measuring uncertainty with relative frequencies

What is Probability?

Probability measures the likelihood of an outcome occurring. It always lies between 0 and 1.

Core Definition
P(Event) = Number of favourable outcomes / Total outcomes in sample space

Key Vocabulary

  • Experiment — any process producing an outcome
  • Sample Space (S) — set of all possible outcomes
  • Event — a subset of the sample space
  • Elementary event — exactly one outcome
Accounting context: When an auditor randomly selects a transaction to test, the sample space is all transactions in the population. The event “transaction contains an error” is a subset of that space.

Relative Frequency (PP4.1)

If A occurs 1050 times out of 3000 total:

P(A) = 1050 / 3000 = 0.35
The total of all mutually exclusive probabilities must sum to 1.0.

Complement Rule

The complement of event A (written A′ or Â) contains all outcomes not in A.

Complement Rule
P(A′) = 1 − P(A)

The event and its complement together exhaust the sample space:

P(A) + P(A′) = 1
PP4.8: If German parts arrive late with P = 0.08, then the probability of not arriving late = 1 − 0.08 = 0.92.
Sample Space S A P(A) A′ 1−P(A)
Why it matters for accounting: If you know the probability of a transaction being correct, the complement gives you the error probability—useful for audit risk assessment.
Section 02

Contingency Tables & Probability Matrices

Organising two-variable data to find joint and marginal probabilities

Contingency Tables

A contingency table cross-tabulates counts for two categorical variables. Divide every cell by the grand total to produce a probability matrix.

PP4.12: n = 50 cases

DETotal
A16824
B10616
C8210
Total341650

Divide by 50 →

DETotal
A0.320.160.48
B0.200.120.32
C0.160.040.20
Total0.680.321.00

Reading the matrix

  • Marginal probability — row or column totals
    e.g. P(A) = 0.48
  • Joint probability — interior cells
    e.g. P(A ∩ E) = 0.16 (highlighted)
  • Grand total — always 1.00
Marginal & Joint
P(A) = 24/50 = 0.48  [marginal]
P(A ∩ E) = 8/50 = 0.16  [joint]
Think of it like this: marginal probabilities live in the margins; joint probabilities live in the interior cells.
Section 03

The Addition Rule

Finding the probability of A or B occurring (union)

Addition Rule — P(A ∪ B)

General Addition Rule
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

We subtract the intersection to avoid double-counting outcomes in both events.

Mutually Exclusive Events (special case)
If A ∩ B = ∅, then:
P(A ∪ B) = P(A) + P(B)
PP4.20 Health Survey:
P(cancer) = 0.60, P(heart) = 0.29, P(both) = 0.25
P(cancer or heart) = 0.60 + 0.29 − 0.25 = 0.64
A B A∩B P(A ∪ B) = shaded area
Mutually exclusive means events cannot happen at the same time. E.g., a company cannot simultaneously be “profitable” and “at a loss” in the same period.
Section 04

Conditional Probability & the Multiplication Rule

Updating probabilities when we know something has already occurred

Conditional Probability

P(A | B) reads: “the probability of A given that B has already occurred.”

Conditional Probability
P(A | B) = P(A ∩ B) / P(B)
Multiplication Rule (rearranged)
P(A ∩ B) = P(A | B) × P(B)
              = P(B | A) × P(A)

PP4.22: Given P(A) = 0.40, P(B | A) = 0.25:

P(A ∩ B) = 0.25 × 0.40 = 0.10

Activity 4.1 — Office Temperature

120 workers; 70 start early, 50 start late. Of early starters, 30 prefer 24°C (40 prefer 20°C). Of late starters, 23 prefer 20°C (27 prefer 24°C).

24°C20°CTotal
Early304070
Late272350
Total5763120
P(Early | 20°C) = P(Early ∩ 20°C) / P(20°C)
= (40/120) / (63/120) = 40/63 ≈ 0.635
Section 05

Statistical Independence

When knowing one event tells us nothing about another

Statistical Independence

Events A and B are independent if knowing B occurred gives no information about A.

Test for Independence
P(A | B) = P(A)   or equivalently:
P(A ∩ B) = P(A) × P(B)

PP4.30 — Museum Visits

VisitedNot visitedTotal
Children <1016080240
No children40120160
Total200200400

Independence test:

P(Visited) = 200/400 = 0.50
P(Visited | Children) = 160/240 = 0.667

0.667 ≠ 0.50  →  NOT independent
Having young children increases the probability of visiting a museum — the variables are associated.
Independence ≠ Mutually exclusive. Independent events can both occur. Mutually exclusive events cannot. A company having both a tax issue and an OH&S breach in the same year is possible (not mutually exclusive), but the two events may or may not be statistically independent.
Section 06

Bayes’ Theorem

Reversing the direction of conditioning to update prior beliefs

Bayes’ Theorem

Bayes’ Theorem
P(A | B) =  P(B | A) × P(A)
             ——————————
             P(B)

Where P(B) = P(B|A)P(A) + P(B|A′)P(A′)

Activity 4.3 — Traffic Violations

  • P(age ≥30) = 0.60  |  P(age <30) = 0.40
  • P(violation | ≥30) = 0.04
  • P(violation | <30) = 0.10
  • Given a violation → P(under 30)?
Solution
P(B) = 0.04×0.60 + 0.10×0.40
      = 0.024 + 0.040 = 0.064

P(<30 | violation) = (0.10 × 0.40) / 0.064
                     = 0.040 / 0.064
                     = 0.625
Although drivers under 30 are only 40% of all drivers, they account for 62.5% of all violations — a powerful result from updating our prior belief with new evidence.
Accounting use: Bayes’ theorem underpins audit sampling—updating the probability of material misstatement after each audit procedure.
Section 07

Counting Rules

Permutations, combinations, and the fundamental counting principle

Counting Rules

Fundamental Counting Principle

If there are m ways to do one thing and n ways to do another, there are m × n ways total.

Act 4.6 (Licence Plates):
263 × 103 = 17,576,000 plates

Permutations (order matters)

nPr Formula
nPr = n! / (n−r)!
Act 4.5 (Race): 9P3 = 9!/(9−3)! = 504 ways to award 1st, 2nd, 3rd

Combinations (order does NOT matter)

nCr Formula
nCr = n! / [r! × (n−r)!]
Act 4.7 (Lottery): 45C6 = 45! / (6! × 39!) = 8,145,060 combinations
Memory tip:
Permutation = Position matters (podium places).
Combination = Choosing a committee (no ranking).
Act 4.8 (High Court):
7C3 × 4C2 = 35 × 6 = 210 ways
Excel Exercises

Three Applied Exercises

Build probability skills in Excel using realistic accounting and audit scenarios. Complete each exercise before checking the step-by-step solutions at the end of this deck.

Exercise 1
Excel

Audit Compliance Probability Matrix

Scenario: You are an audit manager at a mid-tier accounting firm. Your team reviewed 200 client files, classified by industry and compliance status after the most recent audit.
CompliantMinor IssuesMajor IssuesTotal
Financial Services4812464
Retail5218676
Construction3220860
Total1325018200

Your Tasks

  • A
    Build this table in Excel and create a probability matrix (divide all cells by the grand total)
  • B
    Compute marginal probabilities for each industry and each compliance status
  • C
    Find P(Major Issues) and P(Construction ∩ Major Issues)
  • D
    Using the addition rule, find P(Retail ∪ Major Issues)
  • E
    Find P(Major Issues | Construction) and compare it to P(Major Issues). Are compliance issues and being in Construction independent?
Exercise 2
Excel

Credit Card Ownership & Conditional Probability

Scenario: A financial services firm surveys 500 clients about credit card ownership and whether they have ever missed a payment. Use conditional probability to assess default risk by card type.
Never missedMissed ≥1Total
Platinum Card14035175
Standard Card220105325
Total360140500

Your Tasks

  • A
    Convert the table to a probability matrix in Excel
  • B
    Compute P(Missed ≥1 payment) and P(Platinum Card)
  • C
    Calculate P(Missed | Platinum) and P(Missed | Standard) using the conditional probability formula
  • D
    Test whether card type and payment behaviour are statistically independent. Show your working.
  • E
    Use the multiplication rule to verify P(Platinum ∩ Missed) directly from your conditional probabilities
Exercise 3
Excel

Fraud Detection with Bayes’ Theorem

Scenario: A forensic accounting team uses an automated AI-based transaction screening tool to flag potentially fraudulent transactions. The following performance data is known from prior experience:
Prior Information
P(Fraud)3% of all transactions are fraudulent
P(Flag | Fraud)System correctly flags 92% of fraudulent transactions
P(Flag | No Fraud)System incorrectly flags 6% of legitimate transactions (false positive rate)
Key question: An invoice has been flagged. What is the actual probability that it is fraudulent?

Your Tasks

  • A
    Set up a Bayes table in Excel with columns: Hypothesis, Prior P(H), Likelihood P(Flag|H), Joint P(Flag ∩ H), Posterior P(H|Flag)
  • B
    Calculate the total probability of being flagged P(Flag) using the law of total probability
  • C
    Apply Bayes’ theorem to find P(Fraud | Flagged)
  • D
    Interpret the result. Why is the posterior probability so much lower than you might expect given the system’s 92% detection rate?
  • E
    Sensitivity analysis: In Excel, change the base rate P(Fraud) from 3% to 10%. How does P(Fraud | Flagged) change? What does this tell you about audit risk?
Solutions

Step-by-Step Excel Solutions

Detailed instructions for all three exercises, including every formula to type

Exercise 1 — Solution Part A–C
Excel

Audit Compliance Matrix — Setup & Marginal Probabilities

Step 1: Enter the Raw Data

1
In cell A1 type Industry. In B1: Compliant, C1: Minor Issues, D1: Major Issues, E1: Total.
2
Enter industry names in A2:A4 (Financial Services, Retail, Construction) and row totals in E2:E4. Enter all count data in B2:D4.
3
In row 5, compute column totals: =SUM(B2:B4) in B5, then copy across to D5. Grand total in E5: =SUM(E2:E4). Verify it equals 200.

Step 2: Build the Probability Matrix

4
In cell G1, start a duplicate table. In H2: =B2/$E$5. Note the $ signs lock the grand total. Copy this formula across H2:J4.
5
Format all probability cells: select range, right-click → Format Cells → Number → 4 decimal places.

Step 3: Marginal & Joint Probabilities

6
Row marginals (industry totals): in K2: =SUM(H2:J2). Copy to K3:K4.
7
Column marginals (compliance status): in H5: =SUM(H2:H4). Copy to I5:J5.

Answers for Part C

P(Major Issues)18/200 = 0.0900
P(Construction ∩ Major Issues)8/200 = 0.0400

Answer for Part D — Addition Rule

P(Retail ∪ Major) = P(Retail) + P(Major) − P(Retail ∩ Major)
= 0.38 + 0.09 − 0.03 = 0.44
8
In a new cell: =K3+J5-J3 (Retail marginal + Major Issues marginal − intersection cell). Should return 0.44.
Exercise 1 — Solution Part E
Excel

Audit Compliance Matrix — Independence Test

Step 4: Conditional Probability

9
P(Major Issues | Construction) = P(Construction ∩ Major Issues) / P(Construction)
In Excel: =J4/K4
P(Major | Construction)0.04 / 0.30 = 0.1333
P(Major Issues) [marginal]0.0900
Independence check:
If independent: P(Major | Construction) should equal P(Major Issues)
0.1333 ≠ 0.0900  →  NOT independent
10
In a labelled cell, type: =IF(ABS(J4/K4-J5)<0.001,"Independent","NOT Independent") to automate the test.

Interpretation

Construction clients are 48% more likely to have Major Issues than the overall population suggests (13.3% vs 9.0%). This is an actionable audit insight: Construction should receive higher-intensity audit procedures.

Alternative independence check

If independent: P(A ∩ B) = P(A) × P(B)
Expected P(Construction ∩ Major) = 0.30 × 0.09 = 0.027
Actual P(Construction ∩ Major) = 0.04
0.04 ≠ 0.027  →  NOT independent
11
In Excel: =K4*J5 gives 0.0270; compare to =J4 which gives 0.0400. Use =IF(ABS(J4-(K4*J5))<0.001,"Independent","NOT Independent").
Exercise 2 — Solution
Excel

Credit Card Ownership — Full Solution

Setup

1
Enter the raw counts in B2:C3 with row and column headers. Grand total 500 in D4.
2
Probability matrix: =B2/$D$4 in F2, copy to F2:G3. Row/column marginals via =SUM().

Part B — Marginal Probabilities

P(Missed ≥1 payment)140/500 = 0.2800
P(Platinum Card)175/500 = 0.3500

Part C — Conditional Probabilities

3
P(Missed | Platinum) = P(Platinum ∩ Missed) / P(Platinum)
In Excel: =G2/H2 where G2 = joint, H2 = Platinum marginal.
P(Missed | Platinum)35/175 = 0.2000
P(Missed | Standard)105/325 = 0.3231

Part D — Independence Test

P(Missed) = 0.2800
P(Missed | Platinum) = 0.2000
P(Missed | Standard) = 0.3231

0.2000 ≠ 0.2800  →  NOT independent
Platinum card holders miss payments at a lower rate (20%) than the overall population (28%). Card type and payment behaviour are associated. Standard cardholders carry more default risk.

Part E — Multiplication Rule Verification

4
P(Platinum ∩ Missed) via multiplication rule:
= P(Missed|Platinum) × P(Platinum)
= 0.20 × 0.35 = 0.0700
5
Verify directly: 35/500 = 0.0700. Both methods agree.
6
In Excel: =G2/H2*H2 should equal =G2. Use this to confirm your formulas are consistent.
Exercise 3 — Solution Parts A–C
Excel

Fraud Detection — Bayes Table Setup

Step 1: Set Up the Bayes Table

1
Create the following column headers in row 1:
A1: Hypothesis
B1: Prior P(H)
C1: Likelihood P(Flag|H)
D1: Joint P(Flag ∩ H)
E1: Posterior P(H|Flag)
2
Enter hypotheses in A2:A3: “Fraud” and “Not Fraud”.
3
Priors in B2:B3: 0.03 and 0.97. Confirm they sum to 1: =SUM(B2:B3) should return 1.
4
Likelihoods in C2:C3: 0.92 (true positive rate) and 0.06 (false positive rate).

Step 2: Compute Joints & Posterior

5
Joint probabilities in D2: =B2*C2. In D3: =B3*C3.
These represent P(Flagged AND Fraud) and P(Flagged AND Not Fraud).
6
Total probability of being flagged in D4: =SUM(D2:D3). This is P(Flagged).
7
Posterior probabilities in E2: =D2/$D$4. In E3: =D3/$D$4. Verify =SUM(E2:E3) equals 1.

Answers

P(Flag) = total probability0.0276 + 0.0582 = 0.0858
P(Fraud | Flagged)0.0276 / 0.0858 = 0.3217
P(Not Fraud | Flagged)0.0582 / 0.0858 = 0.6783
Exercise 3 — Solution Parts D–E
Excel

Fraud Detection — Interpretation & Sensitivity

Part D — Why only 32%?

Even though the system detects 92% of fraud, a flagged transaction is only fraudulent 32% of the time. This is because fraud is very rare (only 3% base rate), so the absolute number of false positives overwhelms the true positives.
Out of 10,000 transactions:
• 300 fraudulent → 276 correctly flagged (TP)
• 9,700 legitimate → 582 incorrectly flagged (FP)

Of 858 total flags: only 276 are real fraud
276/858 = 0.32  →  32% precision
This is the base rate fallacy — a high detection rate does not guarantee a high posterior probability when the prior is low. Critical lesson for audit analytics.

Part E — Sensitivity Analysis in Excel

8
Change B2 from 0.03 to 0.10 (and B3 to 0.90). All formulas update automatically because you used cell references.
P(Fraud) base rateP(Fraud | Flagged)
1%0.1341
3%0.3217
10%0.6311
20%0.7931
Insight for auditors: When reviewing a high-risk client (higher P(Fraud) prior), the same flagging tool becomes far more reliable. Audit resources should be concentrated where fraud base rates are known to be elevated — exactly the rationale for risk-based auditing.
9
To automate a sensitivity table: use Excel’s Data → What-If Analysis → Data Table with B2 as the row input cell and E2 as the output formula.
Week 4 Complete

Summary

This week connected probability theory to real accounting practice: from reading contingency tables, applying the addition and multiplication rules, testing independence, through to updating beliefs using Bayes’ theorem. The fraud detection exercise demonstrates why base rates matter — a lesson central to audit risk assessment and forensic accounting.

Exercise 1
Audit Compliance
Probability Matrix
Exercise 2
Credit Card
Conditional Probability
Exercise 3
Fraud Detection
Bayes’ Theorem