Introduction 1 / 30
Week 4 · Statistics for Accounting
Probability
Quantifying uncertainty — from theory to credit risk decisions
← → arrow keys to navigate  ·  30 slides
This Week
Learning Outcomes
1
Define probability and apply the three approaches to credit risk scenarios
2
Build and interpret contingency tables; compute marginal and joint probabilities
3
Apply addition and multiplication rules to combined loan events
4
Calculate conditional probabilities and test for statistical independence
5
Use Bayes' theorem to update default risk estimates with new evidence
6
Apply the five counting rules to audit sampling and portfolio problems
01
Section 1
Foundations of Probability
What probability means, the three ways to assign it, and the language of sample spaces and events — grounded in loan assessment.
Foundations
What is Probability?
Credit Risk Context

A bank wants to know: how likely is a new applicant to default? Probability gives us a rigorous language to answer this.

So What?

Every loan approval model, credit score, and loan loss provision rests on probability estimates.

Definition
Probability is a number between 0 and 1 that measures how likely an event is to occur.

P(A) = 0 → impossible  ·  P(A) = 1 → certain
Classical

Equal likelihood

Count favourable outcomes ÷ total outcomes. All outcomes equally likely.

3 loan grades, equally likely → P(Grade A) = 1/3
Relative Freq.

Past data

Proportion of a specific outcome observed over many trials.

120 of 500 applicants defaulted → P(default) = 0.24
Subjective

Expert judgment

No repeatable experiment; based on informed belief.

Analyst: "40% chance this sector deteriorates next quarter."
Foundations
Sample Spaces & Events
Loan Portfolio Scenario

500 loan applications. Each applicant is either employed / unemployed and either defaults / does not default.

Key Terms

Experiment: Randomly selecting one applicant.
Sample space (S): All possible outcomes.
Event (A): A subset of S we care about.

SAMPLE SPACE S — 500 applicants Event A Default (120) Event B Employed (360) A ∩ B Employed & Default (40) Unemployed & Default (80) Employed & No Default (320) Outside A ∪ B: Unemployed & No Default (60)
Quick Check — Section 1
A bank reviews 1,000 past loan accounts and finds 180 defaulted. Which approach to probability is the bank using when it estimates P(default) = 0.18?
A Classical — all outcomes are equally likely
B Relative frequency — based on historical proportions
C Subjective — based on analyst judgment
D Classical — 180 out of 1,000 favourable outcomes

B is correct. The bank is computing the proportion of a specific outcome (default) over many observed trials — that's the relative frequency approach. Classical probability requires all outcomes to be equally likely (not true here — not every applicant has the same default risk). Subjective probability uses expert judgment rather than observed data.

02
Section 2
Contingency Tables & Probability
Organising loan portfolio data into a two-way table to read off marginal and joint probabilities.
Contingency Tables
The Loan Portfolio Dataset
Scenario

A commercial bank reviews 500 closed loan accounts, classified by employment status at application and whether the loan ultimately defaulted.

Why a Table?

A contingency table reveals how two categorical variables relate — essential for credit risk segmentation and loan pricing.

Employment Status Default No Default Row Total
Employed40320360
Unemployed8060140
Column Total120380500
  • Rows represent employment categories
  • Columns represent loan outcome (default vs no default)
  • Each cell counts applicants in that specific combination
  • Row and column totals are called marginals — they summarise each variable alone
Contingency Tables
Marginal Probability
Definition

Probability of one event alone, ignoring the other variable. Found in the row/column totals (the "margins").

P(A) = Marginal Total ÷ 500
EmploymentDefaultNo DefaultRow Total
Employed40320360
Unemployed8060140
Total120380500
1
P(Employed) = 360/500 = 0.72 → 72% of applicants were employed
2
P(Default) = 120/500 = 0.24 → 24% of loans ended in default
3
P(No Default) = 380/500 = 0.76 → complement: 1 − 0.24 = 0.76
Contingency Tables
Joint Probability
Definition

Probability of two events occurring together. Found in the interior cells of the table — not the margins.

P(A∩B) = Cell Count ÷ 500
EmploymentDefaultNo DefaultRow Total
Employed40320360
Unemployed8060140
Total120380500
P(Employed ∩ Default) = 40/500 = 0.08
P(Employed ∩ No Default) = 320/500 = 0.64
P(Unemployed ∩ Default) = 80/500 = 0.16
So What?

Joint probabilities identify the riskiest combinations. Unemployed & Default (0.16) is twice as likely as Employed & Default (0.08).

03
Section 3
Addition & Multiplication Rules
Rules for computing the probability of A or B (addition) and A and B (multiplication) across loan events.
Addition Rule
P(A or B) — Union of Events
Bank Question

What is the probability that a randomly selected applicant is either unemployed OR defaults?

Key Insight

Without the formula we'd double-count applicants who are both unemployed and defaulted.

P(AB) = P(A) + P(B) − P(AB)
General Addition Rule — subtract the overlap to avoid double-counting
1
Let A = Unemployed → P(A) = 140/500 = 0.28
2
Let B = Default → P(B) = 120/500 = 0.24
3
Overlap: P(A∩B) = 80/500 = 0.16
4
P(A∪B) = 0.28 + 0.24 − 0.16 = 0.36

Special case — Mutually Exclusive: If A and B cannot both occur, P(A∩B) = 0, so P(A∪B) = P(A) + P(B). Example: a loan can't be both "approved" and "rejected" simultaneously.

Multiplication Rule
P(A and B) — Intersection
Two Versions

General: Works for any two events.
Special (independent): When events don't affect each other.

Accounting Use

Probability that two independent internal controls both fail simultaneously.

P(AB) = P(A) × P(B|A)
General — probability of A, then B given A already occurred
P(AB) = P(A) × P(B)
Special — ONLY when A and B are statistically independent
eg
P(Unemployed) = 0.28  ·  P(Default|Unemployed) = 80/140 ≈ 0.571
P(Unemployed ∩ Default) = 0.28 × 0.571 = 0.16 ✓ — matches our contingency table cell!
Quick Check — Section 3
Using the loan dataset (500 applicants, 360 employed, 380 no-default, 320 employed & no-default): What is P(Employed OR No Default)?
A 0.72 + 0.76 = 1.48 — add the marginal probabilities
B 0.72 + 0.76 − 0.08 = 1.40 — subtract the wrong overlap
C 0.72 + 0.76 − 0.64 = 0.84 — correct: subtract the joint
D 0.72 × 0.76 = 0.547 — multiply the marginals

C is correct. General Addition Rule: P(Employed ∪ No Default) = P(E) + P(ND) − P(E∩ND) = 360/500 + 380/500 − 320/500 = 0.72 + 0.76 − 0.64 = 0.84. Option A forgets to subtract the overlap entirely. Option B subtracts P(Employed ∩ Default) = 0.08 — the wrong cell. Option D uses multiplication, which gives "and" not "or".

04
Section 4
Conditional Probability & Independence
How knowing one thing about an applicant changes our default risk estimate — and when it tells us nothing new at all.
Conditional Probability
P(A | B) — Given that B occurred
Bank Question

We already know an applicant is unemployed. How does this change our estimate of their default probability?

P(A|B) = P(AB) / P(B)
Restrict the sample space to B; find A within it
EmploymentDefaultNo DefaultRow Total
Employed40320360
Unemployed8060140
Total120380500
1
Restrict to unemployed applicants: new total = 140
2
Among those, how many defaulted? → 80
3
P(Default|Unemployed) = 80/140 = 0.571
Compare

P(Default|Employed) = 40/360 = 0.111. Unemployment multiplies default risk by . This is exactly what credit models should capture in their segmentation.

Conditional Probability
Tree Diagrams
Purpose

Tree diagrams map all paths through sequential events and make conditional probabilities visual.

Reading the Tree

Branches show conditional probabilities. Multiplying along a path gives the joint probability at the leaf.

Select applicant 0.72 Employed (360) 0.28 Unemployed (140) 0.111 Default P = 0.72×0.111 = 0.08 0.889 No Default P = 0.72×0.889 = 0.64 0.571 Default P = 0.28×0.571 = 0.16 0.429 No Default P = 0.28×0.429 = 0.12 Σ = 0.08+0.64+0.16+0.12 = 1.00 ✓
Statistical Independence
Does knowing one thing change anything?
Test for Independence

A and B are independent if:
P(A|B) = P(A)  ·  equivalently:
P(A∩B) = P(A) × P(B)

P(A|B) = P(A) → Independent

Dependent — Our Data

P(Default|Unemployed) = 0.571

P(Default) = 0.24

0.571 ≠ 0.24

Employment status does affect default risk. Events are dependent.

P(D|U) ≠ P(D) → DEPENDENT

Independent — Hypothetical

P(Default|Unemployed) = 0.24

P(Default) = 0.24

0.24 = 0.24

Knowing employment status tells us nothing new about default risk.

P(D|U) = P(D) → INDEPENDENT
Credit Risk Implication

If employment and default were independent, segmenting loans by employment status would add zero value. Our data shows strong dependence — so segmentation genuinely improves risk pricing.

05
Section 5
Bayes' Theorem
The formula that lets a bank update its default probability estimate when new information — like a missed payment — arrives mid-loan.
Bayes' Theorem
Updating Beliefs with Evidence
The Problem

A borrower just missed their first payment. Should the bank reclassify them as high risk? By exactly how much should the estimated default probability increase?

Why Bayes Matters

Foundation of credit scoring models, fraud detection systems, and dynamic loan provisioning — any system that updates beliefs with new data.

Three Key Terms
Prior probability: Initial estimate before new evidence.

Likelihood: How probable is the evidence if the event is true?

Posterior probability: Updated estimate after seeing the evidence.
Prior
30%
High risk before payment data
+
Evidence
Missed
payment
=
Posterior
63.2%
High risk after evidence
Bayes' Theorem
The Formula
P(A|B) = P(B|A) × P(A) / P(B)
where P(B) = P(B|A)·P(A) + P(B|Aᶜ)·P(Aᶜ) — the Law of Total Probability
P(A|B) — Posterior
What We Want
Probability of A given we observed B
P(B|A) — Likelihood
Evidence Strength
How probable is evidence B if A is true?
P(A) — Prior
Initial Belief
Our estimate before seeing evidence B
P(B) — Normaliser
Total Probability of B
Probability of observing B across all scenarios
Bayes' Theorem
Worked Example
Setup

30% of applicants are classified High Risk (HR).
P(Missed Payment | HR) = 0.80
P(Missed Payment | Low Risk) = 0.20

A borrower misses month-1 payment. What is P(HR | Missed Payment)?

1
Prior: P(HR) = 0.30  ·  P(LR) = 0.70
2
Likelihoods: P(MP|HR) = 0.80  ·  P(MP|LR) = 0.20
3
P(MP) = (0.80)(0.30) + (0.20)(0.70) = 0.24 + 0.14 = 0.38
4
P(HR|MP) = (0.80 × 0.30) / 0.38 = 0.24 / 0.38 = 0.632
Probability of High Risk
Before evidence
30%
0.30
After missed payment
63.2%
0.632
Interpretation

One missed payment more than doubles the estimated high-risk probability. This is why early payment behaviour is so predictive in credit scoring models.

Quick Check — Section 5
In the Bayes' example, if the prior P(High Risk) increased to 50% and all likelihoods stayed the same, what would happen to P(High Risk | Missed Payment)?
A Stay the same — the prior doesn't affect the posterior
B Increase — a higher prior pushes the posterior up
C Decrease — more high-risk applicants dilutes the signal
D Become exactly 1.0 — certainty of high risk

B is correct. With P(HR) = 0.50: P(MP) = (0.80)(0.50) + (0.20)(0.50) = 0.40 + 0.10 = 0.50. P(HR|MP) = (0.80 × 0.50) / 0.50 = 0.80. The posterior rose from 0.632 to 0.80. A higher prior always yields a higher posterior — Bayes is a weighted average of prior belief and likelihood evidence. Option D is wrong: with likelihoods < 1, the posterior can never reach 1.0 unless P(MP|LR) = 0.

06
Section 6
Counting Rules
Five techniques for counting the number of ways events can occur — essential for computing classical probabilities in audit sampling and portfolio problems.
Counting Rules
Why Counting Matters
Classical Probability Link

P(Event) = Favourable outcomes ÷ Total outcomes. Counting rules help us compute both efficiently when there are too many outcomes to list.

Accounting Examples

• How many ways can an auditor select 3 accounts from 12?
• How many loan product combinations does the bank offer?
• How many ways can 5 candidates be ranked for a single role?

Rule 1

Multiplication (mn)

m choices for first, n for second → m × n total combinations.

m × n × p × ...
Rule 2

Permutations (all n)

Arrange all n distinct objects in order. Order matters.

n!
Rule 3

Permutations (r of n)

Arrange r objects chosen from n. Order matters.

n! / (n−r)!
Rule 4

Combinations (r of n)

Choose r from n. Order does NOT matter.

n! / [r!(n−r)!]
Rule 5

Partition Rule

Divide n objects into k groups of fixed sizes n₁, n₂, ..., nₖ.

n! / (n₁! × n₂! × ... × nₖ!)
Counting Rules
Permutations vs Combinations
The Only Question You Need

Does order matter? Yes → Permutation. No → Combination.

Memory Hook

Permutation = Positional (1st, 2nd, 3rd — roles matter). Combination = Committee (only who is selected, not their role).

Permutation — Order Matters

Scenario: An audit firm assigns the top 3 candidates from 8 finalists to Senior, Manager, and Analyst roles.

Formula: P(8,3) = 8! / (8−3)! = 8×7×6

= 336 ordered arrangements

ABC ≠ BAC → different role assignments

Combination — Order Irrelevant

Scenario: An auditor selects 3 accounts from 8 for random testing. Any group of 3 is equally valid.

Formula: C(8,3) = 8! / (3! × 5!)

= 56 possible groups

ABC = BAC → same accounts selected
Ratio Insight

C(8,3) = P(8,3) / 3! = 336 / 6 = 56. Combinations are always smaller — we divide out the r! orderings of the selected items.

Counting Rules
Applied Examples
Rule 1 — Multiplication

A bank offers 3 loan types × 4 repayment terms × 2 rate structures. How many distinct loan products?

3 × 4 × 2 = 24 products
Rule 4 — Combinations

An auditor selects 4 accounts from a population of 10 for detailed testing. How many different samples are possible?

C(10,4) = 10!/(4!·6!) = 210
Rule 5 — Partition

12 loan files must be distributed among 3 auditors — 4 files each. In how many ways can this be done?

12!/(4!·4!·4!) = 34,650

Decision guide: Is order important (rankings, roles)? → Permutation. Is only the group what matters? → Combination. Distributing n items into fixed groups? → Partition.

Quick Check — Section 6
A credit committee of 3 members must be chosen from 7 senior analysts. The committee has no designated roles — all members are equal. How many different committees are possible?
A P(7,3) = 210 — use permutations because selection is sequential
B C(7,3) = 35 — use combinations because all roles are equal
C 7³ = 343 — apply the multiplication rule
D 7! = 5,040 — arrange all 7 analysts

B is correct. Since committee members have no designated roles, only who is selected matters — not the order of selection. That's combinations: C(7,3) = 7! / (3! × 4!) = 5040 / (6 × 24) = 35. If the problem instead asked for a Chair, Deputy, and Secretary from 7 analysts, roles would differ → P(7,3) = 210. When in doubt: no roles = combinations; specific roles = permutations.

Week 4
Key Takeaways

Probability Foundations

Three approaches: classical (equal likelihood), relative frequency (past data), subjective (expert judgment). Always 0 ≤ P(A) ≤ 1.

Contingency Tables

Marginal probabilities from row/column totals. Joint probabilities from interior cells. Both divided by the grand total.

Addition Rule

P(A∪B) = P(A) + P(B) − P(A∩B). Subtract overlap to avoid double-counting. Mutually exclusive: no subtraction needed.

Conditional Probability

P(A|B) = P(A∩B)/P(B). Restrict the sample space to B. Independence: P(A|B) = P(A) — knowing B tells us nothing new about A.

Bayes' Theorem

Prior × Likelihood → Posterior. Formal framework for updating default risk estimates as new borrower information arrives.

Counting Rules

Multiplication, Permutations (n or r of n), Combinations, Partitions. Key question every time: does order matter?

Week 4 Complete
Probability is how
we price uncertainty.
Every credit score, audit sample, and loan loss provision
rests on the concepts covered today.
Next Week
Week 5 — Probability Distributions:
Binomial, Poisson & Normal