Week 4: From Effects to Decisions

Synthetic Control, Policy Trees and Counterfactual Analysis

DATA5000 — Artificial Intelligence Programming in Business Analytics

Kaplan Business School

Overview

What We Cover Today

Counterfactual Foundations

Why every causal question is a "what if?" question, and why the answer is always unobservable.

Synthetic Control Model

How to evaluate a policy when there is no control group — by building a synthetic twin of the treated region.

Policy Trees

How to turn 12,000 personalised CATE estimates into one clear targeting rule a manager can act on.

The Unifying Business Question

TeleConnect changed its pricing structure in Western Australia six months ago. Did it work? Who did it work for? And what should the retention team do next quarter?

SCM answers "did it work overall." Policy trees answer "for whom." Counterfactual analysis answers "what should we do instead."

These three tools pick up precisely where Week 3 left off — DML gave us individual treatment effects; today we learn what to do with them at scale.

1 — Bridge from Week 3

What Week 3 Gave Us

Tool	What it does	Output
LightGBM + SHAP	Predicts churn; explains which features drive the prediction	Feature importance rankings (correlation-based)
LinearDML (EconML)	Estimates the causal effect of one action on one outcome, controlling for confounders	ATE, ATT, and individual CATE values for each customer

What We Learned to Do

Distinguish correlation from causation using residualisation (DML)
Identify confounders that bias naive estimates
Estimate how much the proactive support call causally reduces each customer's churn probability

What DML Cannot Tell Us

DML requires treated and untreated customers observed at the same time. But what if the policy applied to everyone in a region simultaneously? And DML gives us 12,000 individual CATE numbers — how do we turn those into an operational targeting decision?

Week 4 answers both questions.

1 — Bridge from Week 3

The Gap Between Effects and Decisions

What Week 3 Left Open

DML needs a concurrent control group — what if the whole region was treated at once?
12,000 CATE estimates are precise but not actionable by a call-centre team
We can say "the call helps on average" but cannot say "call these customers and not those"
We cannot answer "what would have happened if we had done nothing?"

What Week 4 Adds

SCM — evaluate region-wide or market-wide policies with no simultaneous control
Policy trees — compress CATE estimates into simple, auditable decision rules
Counterfactual analysis — rigorously quantify "what if?" scenarios for different strategies

The Through-Line

All three tools are answering the same underlying question from different angles: what is the best action to take, for whom, and how do we know?

1 — Bridge from Week 3

The Scenario: TeleConnect Western Australia

What Happened

Six months ago TeleConnect launched a new pricing structure in Western Australia — the first and only state where it was applied. Monthly charges were restructured, and a loyalty discount was introduced for customers with more than 12 months of tenure.

TeleConnect's leadership now wants answers to three questions:

Did the pricing change reduce churn in WA overall? — and by how much, compared to a world where it never happened?
Which types of customers responded most? — and can we build a targeting rule for the next retention campaign?
What would have happened if we had only targeted the top 30% of likely-responders — instead of the whole state?

Tool Mapping

Question 1 → Synthetic Control Model

Question 2 → Policy Tree (CATE-based)

Question 3 → Counterfactual Analysis

Section 2

Counterfactual Foundations

Before we build any model, we need to understand the philosophical problem every causal tool is trying to solve.

2 — Counterfactual Foundations

The Fundamental Problem of Causal Inference

The Problem in One Sentence

To know whether an action caused an outcome, you need to compare what happened with the action against what would have happened without it — but you can only observe one of those two worlds.

Figure 1. Every customer exists in only one of the two paths. The other is forever unobservable.

This is called the Fundamental Problem of Causal Inference (Holland, 1986). All of causal statistics — DML, SCM, randomised experiments — are different strategies for estimating the unobservable path as precisely as possible.

2 — Counterfactual Foundations

What Is a Counterfactual?

Definition

A counterfactual is the outcome that would have occurred in an alternative version of events that did not actually happen. It answers the question: "What if things had been different?"

Everyday Counterfactuals

"If I hadn't studied, would I have passed?"
"If we hadn't hired the extra support staff, how long would wait times have been?"
"If interest rates had stayed low, would house prices have risen so fast?"

These feel natural to ask. They are almost impossible to answer rigorously without the right tools.

Business Counterfactuals

"If TeleConnect had not changed WA pricing, what would the churn rate be today?"
"If we had targeted only the top 30% of CATE customers, how many churns would we have prevented?"
"If the campaign had launched two months earlier, how much more retention would we have achieved?"

The Doctor Analogy

A doctor prescribes chemotherapy. The patient recovers. Did chemotherapy cause the recovery — or would the patient have recovered anyway? The counterfactual world (no chemotherapy) is unobservable. All clinical trials, all causal AI, and all the tools we study this week are attempting to rigorously construct this unobservable comparison.

2 — Counterfactual Foundations

Why We Can Never Observe the Counterfactual Directly

The One-Reality Problem

At any moment in time, each customer, state, or business unit exists in exactly one state: treated or untreated. Once WA receives the pricing change, the "WA without the pricing change" no longer exists to observe. Time cannot be rewound.

What We Can Observe

WA churn rate after the pricing change
WA churn rate before the pricing change
Other states' churn rates throughout the same period

What We Cannot Observe

WA churn rate after the period, without the pricing change
Any customer's outcome under both treatment and no-treatment simultaneously

This is the "missing data" problem that all causal tools solve through estimation.

The Trap: Before vs After

The simplest (and most dangerous) approach is to compare WA's churn rate before and after the pricing change. But this conflates the pricing effect with everything else that changed during that period: seasonality, competitor moves, economic conditions, service upgrades. A good counterfactual must control for all of these.

2 — Counterfactual Foundations

Three Tools, Three Counterfactual Questions

Tool	Counterfactual question it answers	Level of analysis	When to use
Synthetic Control (SCM)	"What would WA's churn rate have been if the pricing change had never happened?"	Aggregate: region, market, or time period	Whole-of-region policies with no concurrent control group
DML + CausalForest (Week 3)	"What would Customer #4821's churn probability be if we changed their contract type?"	Individual: each customer's causal response	Observational data with treated and untreated customers observed simultaneously
Policy Tree + Policy Value	"What would total churn be under a different targeting rule than the one we used?"	Population: the entire customer base under a different decision	Translating CATE estimates into operational targeting decisions

These tools are complementary, not competing. A complete analysis uses all three: SCM to validate the overall policy, CATE to personalise the response, and policy trees to operationalise the recommendation.

2 — Counterfactual Foundations

Knowledge Checkpoint 1

Counterfactual Thinking

"TeleConnect's WA churn rate fell from 35% to 22% in the six months following the new pricing structure."

Q1

A manager says: "The pricing change cut churn by 13pp — it clearly worked." What critical piece of information is missing from this conclusion?

The exact size of the pricing change What WA's churn rate would have been without the pricing change (the counterfactual) The churn rate in NSW at the same time The total number of WA customers affected

Q2

WA's pricing change was applied to every WA customer simultaneously. There is no group of WA customers who did not receive it. Which tool is most appropriate?

DML (Double Machine Learning) A/B randomised experiment Synthetic Control Model (SCM) SHAP feature importance

Q3

Why is a simple before/after comparison of WA churn rates not sufficient to attribute the drop to the pricing change?

Q4

In your own words, why is the counterfactual called "synthetic" when SCM constructs it?

Section 3

Synthetic Control Model

How do you evaluate a policy when the entire region was treated — and there is no untreated group to compare against?

3 — Synthetic Control Model

The Problem SCM Solves

The Setup

TeleConnect changed pricing in WA only. There are no WA customers who did not receive the change. The counterfactual — "WA without the pricing change" — has no direct empirical counterpart.

Why DML Does Not Apply Here

DML needs a treated group and a control group observed at the same point in time. In WA, everyone is treated. There is no concurrent control individual to form the residual comparison.

Why A/B Testing Is Not Available

TeleConnect cannot randomly assign some WA customers to a "no pricing change" condition — the policy was a commercial decision applied uniformly. The experiment was never designed; it just happened.

SCM's Answer

Use data from other states that did not receive the pricing change — NSW, VIC, QLD, SA — to construct a synthetic version of WA that mimics what WA would have looked like in the absence of the intervention. The gap between real WA and its synthetic twin, measured after the policy launch, is the causal effect.

3 — Synthetic Control Model

Why Simple Comparisons Break Down

Before/After Comparison

What it does: Compare WA's churn rate in month 1–6 (pre) vs. month 7–12 (post).

Why it fails: The drop may be partly or entirely driven by factors unrelated to the pricing change — seasonal patterns, a competitor's service outage, national economic conditions, or TeleConnect's own service improvements that coincided with the launch.

WA vs. NSW Direct Comparison

What it does: Compare WA's churn in the post-period to NSW's churn in the post-period.

Why it fails: NSW may differ from WA in customer demographics, competitive landscape, urbanisation, income levels, or TeleConnect's network quality. These pre-existing differences will contaminate the estimate.

The Key Insight

A valid comparison unit must have been on the same trajectory as WA before the policy. No single state is a perfect match. But a weighted blend of multiple states might be — and SCM finds those weights systematically.

3 — Synthetic Control Model

The Synthetic Twin: Intuition

The Diet Analogy

You want to know whether your new diet is working. You cannot un-eat the food. But you could find several friends who share your starting weight, metabolism, activity level, and eating habits — and combine their outcomes into a composite "you-without-the-diet." The difference between your actual weight and your synthetic twin's weight after six weeks is your diet's causal effect.

Figure 2. SCM constructs a synthetic WA from a weighted blend of untreated donor states, chosen to match WA's pre-policy behaviour.

3 — Synthetic Control Model

How SCM Chooses the Weights

The Optimisation Problem

SCM finds the weights $w_{\text{NSW}}, w_{\text{VIC}}, w_{\text{QLD}}, \ldots$ that minimise the difference between real WA and the synthetic WA during the pre-treatment period, subject to the weights being non-negative and summing to 1.

What Goes Into the Matching

Pre-policy churn rate trajectories (month by month)
Relevant customer characteristics: average tenure, contract mix, charges
Any pre-treatment outcome predictors the analyst chooses to include

What the Weights Mean

NSW gets a weight of 0.45 because its pre-policy churn trend looks most like WA's. SA gets a weight of 0.05 because it is not very similar. A state that looks nothing like WA will get a weight of zero.

The weights are not chosen by hand — they are the solution to a constrained least-squares problem.

Pre-Treatment Fit Is the Validity Check

Before trusting the post-policy gap, we verify that synthetic WA tracks real WA closely during the pre-policy period. If the lines do not match in the pre-period, the synthetic control is not credible — and the post-period gap should not be interpreted causally.

3 — Synthetic Control Model

Pre-Policy Period: The Lines Must Match

Figure 3. During the pre-policy period (months 1–6), synthetic WA matches real WA closely. This validates the synthetic control before we examine the post-period.

The lines are nearly identical in the pre-period. This is the key validity check: if the synthetic WA could not track real WA before the policy, we would have no reason to trust it as a counterfactual after the policy.

3 — Synthetic Control Model

After the Policy: The Gap Emerges

Figure 4. After the policy launches, real WA churn falls while synthetic WA continues on its natural trajectory. The gap at month 12 is approximately 17 percentage points — the estimated causal effect.

3 — Synthetic Control Model

Reading the SCM Result

Metric	Value	Interpretation
Real WA churn at M12	22%	The actual observed outcome after the pricing change
Synthetic WA churn at M12	39%	Estimated counterfactual — what WA would look like without the change
SCM causal effect	−17 percentage points	The pricing change reduced monthly churn by 17pp on average
Pre-period fit (RMSPE)	0.4pp	Synthetic WA tracked real WA within 0.4pp on average in the pre-period — excellent fit

What "−17pp" Means in Business Terms

If WA has 80,000 TeleConnect customers, a 17pp reduction in monthly churn rate represents approximately 13,600 customers retained per month who would otherwise have left. At an average customer lifetime value of $400, that is $5.4 million in preserved revenue per month.

Confidence in the Estimate

The pre-period RMSPE of 0.4pp means the synthetic control was tracking WA within less than half a percentage point before the policy. This tight pre-period fit gives us high confidence that the 17pp post-period gap is driven by the pricing change, not by pre-existing differences.

3 — Synthetic Control Model

When to Trust SCM: Key Assumptions

Assumption	Plain language	What happens if violated
Common trends (pre-period fit)	The synthetic control must track the treated unit closely before the intervention	The post-period gap is not credibly attributable to the treatment
No spillover	The WA pricing change must not affect customer behaviour in NSW, VIC, or QLD	Donor states are contaminated — the synthetic twin is biased
No anticipation	Customers must not have changed behaviour before the policy launched in response to knowing it was coming	The pre-period baseline is contaminated and the effect is understated
Stable donor pool	No major concurrent shocks hit the donor states (NSW, VIC, QLD) that did not also hit WA	The synthetic WA's post-period trajectory no longer represents WA's counterfactual

The No-Spillover Assumption in Telecom

If WA customers tell NSW family members about the new pricing and those NSW customers switch providers, the donor pool is contaminated. TeleConnect should check whether NSW churn changed immediately after the WA launch as a diagnostic.

3 — Synthetic Control Model

Knowledge Checkpoint 2

Synthetic Control Model

Synthetic WA = 0.45 × NSW + 0.30 × VIC + 0.20 × QLD + 0.05 × SA. Post-policy churn rates: NSW 34%, VIC 38%, QLD 32%, SA 40%. Real WA churn: 22%.

Q1

Calculate synthetic WA's estimated churn rate. Show your working.

Q2

Using your answer to Q1, what is the SCM's estimated causal effect of the WA pricing change?

Q3

What is the purpose of the pre-treatment matching period in SCM?

To measure the treatment effect directly To calibrate weights so the synthetic control matches the treated unit before the intervention To identify which confounders to include in the model

Q4

Name one SCM assumption that could be violated if WA customers frequently share TeleConnect plan advice with interstate family members.

Section 4

Policy Trees

SCM told us the pricing change worked overall. Now: who responded most — and how do we build a targeting rule a manager can actually follow?

4 — Policy Trees

The Problem: 12,000 CATEs, One Decision Needed

What DML Gave Us

EconML's CausalForest estimated a personalised treatment effect (CATE) for each of TeleConnect's 12,000 WA customers — the expected reduction in churn probability if each individual receives the proactive retention call.

The Operational Problem

A call-centre manager cannot process 12,000 individual scores. They need a rule: "Call customers who look like X. Do not call customers who look like Y."

Even if they could, a rule with thousands of conditions is not auditable, explainable, or robust to new customers.

What We Need

A small number of interpretable rules — ideally expressible as a short decision tree — that:

Assign each customer to a recommended action
Maximise total churn reduction across all customers
Are transparent enough for a manager to audit and explain to regulators

A policy tree is a decision tree trained not to predict an outcome but to prescribe an action — specifically, the action that maximises expected welfare given each customer's CATE estimate.

4 — Policy Trees

What Is a Policy Tree?

Definition

A policy tree is a decision tree that partitions the customer population into groups based on observable features, and prescribes the action that produces the highest expected treatment benefit for each group — using CATE estimates as the welfare signal for each split.

The GPS Analogy

A CATE forest is a GPS that generates a unique, personalised route for every car on every road simultaneously. A policy tree is the highway sign that collapses all of that precision into: "Trucks over 4.5 tonnes — left lane. Everyone else — right lane."

It sacrifices some individual precision in exchange for being operationally deployable at scale. The trade-off is intentional and measurable (through policy value).

Inputs to the Policy Tree

Individual CATE estimates from Stage 1 (CausalForest or LinearDML)
Observable customer features (service calls, contract type, charges, tenure)
Maximum tree depth constraint (set by the analyst — typically depth 2–3)

Output of the Policy Tree

A set of if-then rules mapping customer features to recommended actions
A policy value estimate: expected churn reduction if the rule is followed
Leaf-level CATE summaries (average treatment effect in each group)

4 — Policy Trees

Prediction Tree vs. Policy Tree: A Critical Distinction

Churn Prediction Tree

Trained on: Historical labels — did this customer churn (1) or not (0)?

Splits to maximise: Prediction accuracy (Gini impurity or cross-entropy)

Output: "This customer has a 74% probability of churning"

Question answered: Who will churn?

Key limitation: A customer with 90% churn probability may not respond to any intervention — they have already decided to leave.

Policy Tree (EconML PolicyTree)

Trained on: CATE estimates — how much does this customer's churn probability change under treatment?

Splits to maximise: Total expected welfare gain (sum of CATEs in treated leaves)

Output: "Call this customer — the intervention will reduce their churn probability by 14pp"

Question answered: Who will respond to the intervention?

Key strength: Correctly deprioritises high-risk customers who will not respond, saving intervention budget for those where it matters.

The Most Common Mistake in Retention Campaigns

Calling the highest-churn-risk customers. These are often customers who have already made their decision to leave. The right target is the customer with the largest treatment response — which may be a medium-risk customer who is undecided.

4 — Policy Trees

How a Policy Tree Is Built

Two-Stage Process

Stage 1 — Estimate CATEs: Use CausalForest or LinearDML to compute an individualised CATE for each customer. These become the welfare signal that guides tree splitting.

Stage 2 — Build the policy tree: Fit EconML's PolicyTree on customer features (X) and CATE estimates. Each split is chosen to maximise the weighted sum of CATEs in the "treat" leaf.

How Each Split Is Chosen

For every candidate split (e.g., service_calls > 4), the algorithm asks: "If I follow this rule, how much total churn reduction do I gain across all customers in both resulting groups?" The split that maximises this gain is chosen.

This is fundamentally different from a prediction tree, which splits to minimise prediction error.

Depth Constraint

Deeper trees give more precision but sacrifice interpretability. A depth-2 tree has at most 4 leaf nodes — 4 customer groups. A depth-3 tree has at most 8.

In practice, depth 2–3 offers the best balance between welfare maximisation and operational simplicity. Most call-centre managers can follow a 4-rule decision tree.

EconML's PolicyTree uses a variant of greedy recursive partitioning with honest splitting — the same sample is not used for both the split selection and the welfare estimation, reducing overfitting.

4 — Policy Trees

Reading the TeleConnect Policy Tree

Figure 5. TeleConnect policy tree (depth 2). Three groups are recommended for the proactive call; one group is deprioritised. The recommended action for each leaf is determined by its average CATE.

4 — Policy Trees

Policy Value: Measuring What the Rule Is Worth

Definition

Policy value is the expected total welfare gain from applying the policy tree's targeting rule to the full customer population — measured as the average CATE across all customers who are recommended for treatment under the rule.

Targeting Strategy	Customers called	Avg CATE in treated group	Total churn reduction (pp × n)
Policy tree rule	3,542 (Leaves 1+2+3)	−11.1pp	−393 churns prevented
Call everyone (blanket)	5,800	−6.7pp (average CATE)	−389 churns, higher cost
Call top 20% by churn score	1,160	−3.2pp (low CATE — high-risk but unresponsive)	−37 churns prevented
No action	0	—	0

Key Finding

The policy tree rule prevents almost as many churns as blanket calling (393 vs. 389) while calling 39% fewer customers — saving roughly 2,258 call-centre hours. Calling the top 20% by churn score is dramatically less effective because those customers have high churn risk but low treatment response.

4 — Policy Trees

CATE by Customer Segment: Who Responds Most?

Figure 6. Average CATE by segment. Green-region bars are recommended for the proactive call; grey bars represent customers where the intervention has minimal expected benefit.

The segment with the highest CATE (−14pp) is customers with more than 4 service calls on a month-to-month plan. These customers are frustrated enough to consider leaving but have not yet made a final decision — they are the most responsive to intervention.

4 — Policy Trees

Who Not to Target: The Deprioritisation Logic

The Counter-Intuitive Finding

The policy tree recommends NOT calling 2,258 customers — even though some of them have a high predicted churn probability. This surprises managers who are used to churn prediction models.

High-Churn, Low-CATE Customers

Some customers have already made their decision. They have contacted support multiple times, compared competitors' plans, and made up their mind. A proactive call at this stage is unlikely to change the outcome — and may accelerate it by drawing attention to their dissatisfaction.

CATE: near zero. Churn probability: very high. Action: do not call.

Low-Churn, Low-CATE Customers

Happy, long-tenured customers on annual plans with low service call frequency are not at risk. Calling them has low expected benefit — and risks surprising them into thinking something is wrong.

CATE: near zero. Churn probability: low. Action: do not call.

The Right Target: Undecided, Responsive Customers

The highest-CATE customers are those who are dissatisfied (evidenced by service calls) but have not yet committed to leaving (often still on month-to-month, not yet contacted a competitor). A timely, empathetic call at this moment has the highest probability of changing the outcome.

4 — Policy Trees

Knowledge Checkpoint 3

Policy Trees

Refer to the TeleConnect policy tree on slide 27. A call-centre manager has a list of at-risk customers and wants to use the tree to prioritise outreach.

Q1

A churn prediction tree predicts who will churn. What does a policy tree prescribe instead?

Which customers will stay Which customers should receive an intervention, based on their expected treatment response The causal effect across the whole population

Q2

A customer has 3 service calls and a monthly_charges of $95. According to the tree, should they be called?

Yes — high charges push them to Leaf 3 (CALL, −6pp) No — service_calls ≤ 4 sends them right; charges > $70 sends them to Leaf 3 (CALL) Yes, because their churn probability is high

Q3

Why does the policy tree recommend NOT calling 2,258 customers, even though some have high churn probability?

Q4

Policy value measures the expected total welfare gain from the targeting rule. According to slide 28, which strategy has the highest policy value per customer called?

Section 5

What If? Counterfactual Analysis

We now have SCM and policy tree estimates. Counterfactual analysis asks: what would have happened under a different decision — and by how much?

5 — Counterfactual Analysis

Individual Counterfactuals

Definition

An individual counterfactual asks: "What would this specific customer's outcome have been under a different action?" It uses the CATE estimate for that customer as the bridge between the factual and counterfactual worlds.

Customer #4821: Individual Counterfactual

Scenario	Action	Estimated churn probability
Factual (what happened)	Proactive call received	22% (actual outcome)
Counterfactual 1	No proactive call	22% + 14pp = 36% (estimated)
Counterfactual 2	$20 bill credit instead of call	22% − 4pp = 18% (estimated)
Counterfactual 3	Both call and bill credit	22% − 4pp − 6pp = 12% (estimated)

What We Can Use These For

Estimating the value of each action for a specific customer
Optimising the combination of interventions within a budget
Explaining to a manager why one action was chosen over another

The Crucial Caveat

Individual counterfactuals are estimated, not observed. They rely on the CATE model's accuracy. The further the counterfactual action is from anything in the training data, the less reliable the estimate.

5 — Counterfactual Analysis

Policy Counterfactuals

Definition

A policy counterfactual asks: "What would the total outcome have been across the entire population if we had used a different targeting rule?" It uses the policy value framework to compare alternative decisions.

Policy scenario	Customers contacted	Total churns prevented	Cost (calls × $45/call)	Cost per churn prevented
Policy tree rule (actual)	3,542	393	$159,390	$406
Counterfactual: Call everyone	5,800	389	$261,000	$671
Counterfactual: Top 20% churn risk	1,160	37	$52,200	$1,411
Counterfactual: Top 30% CATE	1,740	312	$78,300	$251
Counterfactual: No action	0	0	$0	—

The Counterfactual Insight

If TeleConnect had used churn risk scores (the conventional approach) rather than CATE-based policy trees, they would have prevented only 37 churns at a cost of $1,411 per churn prevented — versus 393 churns at $406 each with the policy tree. The CATE approach is 10x more effective at identifying responsive customers.

5 — Counterfactual Analysis

What If We Had Started the Campaign Earlier?

The Timing Counterfactual

TeleConnect launched its retention campaign in month 7 (simultaneously with the pricing change). What if they had started two months earlier — in month 5 — reaching customers before their frustration peaked?

Figure 7. Counterfactual comparison of retention strategies by total churns prevented. Launching 2 months earlier recovers ~19% more churns than the actual policy, because customers are contacted before frustration becomes a firm decision.

5 — Counterfactual Analysis

What If We Had Targeted Differently?

The Targeting Counterfactual: A Framework

Every targeting decision implies a counterfactual: "What if we had used a different rule?" Quantifying these counterfactuals before the campaign launches (using CATE estimates and policy value) allows TeleConnect to choose the best strategy — rather than evaluating it only in retrospect.

Before a Campaign

Use CATE estimates and policy tree simulations to compare alternative targeting rules. Choose the rule with the highest policy value per dollar spent.

Proactive — optimal decisions without waiting for outcomes.

After a Campaign

Use SCM to measure what actually happened vs. the synthetic counterfactual. Compare with the pre-campaign policy value prediction. Validate or update the model.

Retrospective — model validation and learning loop.

Real-Time

As new customers are added, re-estimate CATEs and re-run the policy tree. Update the call list weekly to reflect the most current risk and responsiveness estimates.

Continuous — the model becomes the operational system.

The Business Case for Counterfactual Thinking

Every business decision is a bet against an implicit counterfactual. Making that counterfactual explicit — and quantifying it with the tools we have studied — is the difference between data-informed decision-making and genuine causal intelligence.

5 — Counterfactual Analysis

Knowledge Checkpoint 4

Integration: Putting It All Together

TeleConnect's leadership is reviewing the WA pricing change and planning next quarter's retention strategy for all five Australian states.

Q1

Which tool should TeleConnect use to answer: "Did the WA pricing change actually reduce churn — compared to a world where it never happened?"

Policy Tree DML / CausalForest Synthetic Control Model (SCM) SHAP feature importance

Q2

Which tool translates CATE estimates into an actionable "call these customers, not those" targeting rule for the call-centre team?

Synthetic Control Model Policy Tree SHAP Waterfall Naive ATE comparison

Q3

A strategy analyst asks: "What would total churn be if we had targeted only the top 30% of customers by CATE instead of using the full policy tree rule?" What type of analysis is this?

Q4

In one sentence, explain how SCM and CATE/policy trees complement each other in a complete causal analysis.

Section 6

Putting It Together

Three tools. One business event. A complete causal story.

6 — Synthesis

Three Tools, One Business Question

Business question	Tool	Answer for TeleConnect WA
"Did the pricing change actually reduce churn — or would churn have fallen anyway?"	SCM	Yes — real WA churn fell 17pp below its synthetic twin by month 12, with an excellent pre-period fit (RMSPE = 0.4pp)
"Who responded most to the retention call? Who should we target next quarter?"	Policy Tree	Call customers with service_calls > 4 (especially on monthly plans). Expected CATE: −14pp. Deprioritise low-call, low-charge customers (CATE: −1pp)
"What would have happened if we had launched the campaign 2 months earlier, or targeted only the top 30% by CATE?"	Counterfactual Analysis	Earlier launch: +19% more churns prevented. Top 30% CATE: 79% of the benefit at 49% of the cost. Never use churn risk scores alone — they are 10x less efficient than CATE targeting.

The Combined Workflow

Past: SCM validates that the policy worked overall. Present: Policy tree operationalises who to contact next. Future: Counterfactual analysis tests alternative strategies before they are executed, closing the loop between data science and business planning.

6 — Synthesis

Key Takeaways

Synthetic Control Model

Builds a synthetic twin of the treated unit from a weighted blend of untreated donor units
Pre-treatment fit is the validity check — the lines must match before the policy launches
The post-policy gap is the causal effect estimate
Relies on no-spillover and parallel-trends assumptions

Policy Trees

Translate 12,000 individual CATEs into an interpretable targeting rule
Trained on treatment effect estimates, not outcome labels
High churn risk does not imply high CATE — target the undecided, not the committed
Policy value measures the expected benefit of the rule vs. alternatives

Counterfactual Analysis

Every causal question is a "what if?" — a comparison between the factual and an unobserved alternative
Individual counterfactuals use CATE to estimate alternative outcomes per customer
Policy counterfactuals use policy value to compare alternative targeting strategies
Counterfactual thinking turns hindsight into foresight

The Principle Connecting All Three

Prediction tells you what will happen.
Causation tells you how to change it.
Counterfactual analysis tells you what the change is worth — and whether there is a better one.

Workshop: Week 4 Jupyter Notebook

Part	What you will do	Tool
Demo 1	Fit a Synthetic Control to the TeleConnect WA dataset; visualise pre-period fit and the post-policy gap	SparseSC or manual weighted regression
Demo 2	Build a policy tree from the Week 3 CATE estimates; compare leaf-level CATEs and compute policy value	EconML PolicyTree
Demo 3	Run three "what if?" counterfactual scenarios and compare total churn prevented under each	Policy value simulation
Exercises	Repeat the full workflow on a new dataset (MobTel national rollout); fill in code and markdown blanks	All three tools

DATA5000 — Week 4 | Open your Jupyter notebook to begin