A bank wants to know: how likely is a new applicant to default? Probability gives us a rigorous language to answer this.
Every loan approval model, credit score, and loan loss provision rests on probability estimates.
Count favourable outcomes ÷ total outcomes. All outcomes equally likely.
Proportion of a specific outcome observed over many trials.
No repeatable experiment; based on informed belief.
500 loan applications. Each applicant is either employed / unemployed and either defaults / does not default.
Experiment: Randomly selecting one applicant.
Sample space (S): All possible outcomes.
Event (A): A subset of S we care about.
B is correct. The bank is computing the proportion of a specific outcome (default) over many observed trials — that's the relative frequency approach. Classical probability requires all outcomes to be equally likely (not true here — not every applicant has the same default risk). Subjective probability uses expert judgment rather than observed data.
A commercial bank reviews 500 closed loan accounts, classified by employment status at application and whether the loan ultimately defaulted.
A contingency table reveals how two categorical variables relate — essential for credit risk segmentation and loan pricing.
| Employment Status | Default | No Default | Row Total |
|---|---|---|---|
| Employed | 40 | 320 | 360 |
| Unemployed | 80 | 60 | 140 |
| Column Total | 120 | 380 | 500 |
Probability of one event alone, ignoring the other variable. Found in the row/column totals (the "margins").
| Employment | Default | No Default | Row Total |
|---|---|---|---|
| Employed | 40 | 320 | 360 |
| Unemployed | 80 | 60 | 140 |
| Total | 120 | 380 | 500 |
Probability of two events occurring together. Found in the interior cells of the table — not the margins.
| Employment | Default | No Default | Row Total |
|---|---|---|---|
| Employed | 40 | 320 | 360 |
| Unemployed | 80 | 60 | 140 |
| Total | 120 | 380 | 500 |
Joint probabilities identify the riskiest combinations. Unemployed & Default (0.16) is twice as likely as Employed & Default (0.08).
What is the probability that a randomly selected applicant is either unemployed OR defaults?
Without the formula we'd double-count applicants who are both unemployed and defaulted.
Special case — Mutually Exclusive: If A and B cannot both occur, P(A∩B) = 0, so P(A∪B) = P(A) + P(B). Example: a loan can't be both "approved" and "rejected" simultaneously.
General: Works for any two events.
Special (independent): When events don't affect each other.
Probability that two independent internal controls both fail simultaneously.
C is correct. General Addition Rule: P(Employed ∪ No Default) = P(E) + P(ND) − P(E∩ND) = 360/500 + 380/500 − 320/500 = 0.72 + 0.76 − 0.64 = 0.84. Option A forgets to subtract the overlap entirely. Option B subtracts P(Employed ∩ Default) = 0.08 — the wrong cell. Option D uses multiplication, which gives "and" not "or".
We already know an applicant is unemployed. How does this change our estimate of their default probability?
| Employment | Default | No Default | Row Total |
|---|---|---|---|
| Employed | 40 | 320 | 360 |
| Unemployed | 80 | 60 | 140 |
| Total | 120 | 380 | 500 |
P(Default|Employed) = 40/360 = 0.111. Unemployment multiplies default risk by 5×. This is exactly what credit models should capture in their segmentation.
Tree diagrams map all paths through sequential events and make conditional probabilities visual.
Branches show conditional probabilities. Multiplying along a path gives the joint probability at the leaf.
A and B are independent if:
P(A|B) = P(A) · equivalently:
P(A∩B) = P(A) × P(B)
P(Default|Unemployed) = 0.571
P(Default) = 0.24
0.571 ≠ 0.24
Employment status does affect default risk. Events are dependent.
P(Default|Unemployed) = 0.24
P(Default) = 0.24
0.24 = 0.24
Knowing employment status tells us nothing new about default risk.
If employment and default were independent, segmenting loans by employment status would add zero value. Our data shows strong dependence — so segmentation genuinely improves risk pricing.
A borrower just missed their first payment. Should the bank reclassify them as high risk? By exactly how much should the estimated default probability increase?
Foundation of credit scoring models, fraud detection systems, and dynamic loan provisioning — any system that updates beliefs with new data.
30% of applicants are classified High Risk (HR).
P(Missed Payment | HR) = 0.80
P(Missed Payment | Low Risk) = 0.20
A borrower misses month-1 payment. What is P(HR | Missed Payment)?
One missed payment more than doubles the estimated high-risk probability. This is why early payment behaviour is so predictive in credit scoring models.
B is correct. With P(HR) = 0.50: P(MP) = (0.80)(0.50) + (0.20)(0.50) = 0.40 + 0.10 = 0.50. P(HR|MP) = (0.80 × 0.50) / 0.50 = 0.80. The posterior rose from 0.632 to 0.80. A higher prior always yields a higher posterior — Bayes is a weighted average of prior belief and likelihood evidence. Option D is wrong: with likelihoods < 1, the posterior can never reach 1.0 unless P(MP|LR) = 0.
P(Event) = Favourable outcomes ÷ Total outcomes. Counting rules help us compute both efficiently when there are too many outcomes to list.
• How many ways can an auditor select 3 accounts from 12?
• How many loan product combinations does the bank offer?
• How many ways can 5 candidates be ranked for a single role?
m choices for first, n for second → m × n total combinations.
Arrange all n distinct objects in order. Order matters.
Arrange r objects chosen from n. Order matters.
Choose r from n. Order does NOT matter.
Divide n objects into k groups of fixed sizes n₁, n₂, ..., nₖ.
Does order matter? Yes → Permutation. No → Combination.
Permutation = Positional (1st, 2nd, 3rd — roles matter). Combination = Committee (only who is selected, not their role).
Scenario: An audit firm assigns the top 3 candidates from 8 finalists to Senior, Manager, and Analyst roles.
Formula: P(8,3) = 8! / (8−3)! = 8×7×6
= 336 ordered arrangements
Scenario: An auditor selects 3 accounts from 8 for random testing. Any group of 3 is equally valid.
Formula: C(8,3) = 8! / (3! × 5!)
= 56 possible groups
C(8,3) = P(8,3) / 3! = 336 / 6 = 56. Combinations are always smaller — we divide out the r! orderings of the selected items.
A bank offers 3 loan types × 4 repayment terms × 2 rate structures. How many distinct loan products?
An auditor selects 4 accounts from a population of 10 for detailed testing. How many different samples are possible?
12 loan files must be distributed among 3 auditors — 4 files each. In how many ways can this be done?
Decision guide: Is order important (rankings, roles)? → Permutation. Is only the group what matters? → Combination. Distributing n items into fixed groups? → Partition.
B is correct. Since committee members have no designated roles, only who is selected matters — not the order of selection. That's combinations: C(7,3) = 7! / (3! × 4!) = 5040 / (6 × 24) = 35. If the problem instead asked for a Chair, Deputy, and Secretary from 7 analysts, roles would differ → P(7,3) = 210. When in doubt: no roles = combinations; specific roles = permutations.
Three approaches: classical (equal likelihood), relative frequency (past data), subjective (expert judgment). Always 0 ≤ P(A) ≤ 1.
Marginal probabilities from row/column totals. Joint probabilities from interior cells. Both divided by the grand total.
P(A∪B) = P(A) + P(B) − P(A∩B). Subtract overlap to avoid double-counting. Mutually exclusive: no subtraction needed.
P(A|B) = P(A∩B)/P(B). Restrict the sample space to B. Independence: P(A|B) = P(A) — knowing B tells us nothing new about A.
Prior × Likelihood → Posterior. Formal framework for updating default risk estimates as new borrower information arrives.
Multiplication, Permutations (n or r of n), Combinations, Partitions. Key question every time: does order matter?