DATA5000 — Artificial Intelligence Programming in Business Analytics
Kaplan Business School
Overview
Why every causal question is a "what if?" question, and why the answer is always unobservable.
How to evaluate a policy when there is no control group — by building a synthetic twin of the treated region.
How to turn 12,000 personalised CATE estimates into one clear targeting rule a manager can act on.
TeleConnect changed its pricing structure in Western Australia six months ago. Did it work? Who did it work for? And what should the retention team do next quarter?
SCM answers "did it work overall." Policy trees answer "for whom." Counterfactual analysis answers "what should we do instead."
These three tools pick up precisely where Week 3 left off — DML gave us individual treatment effects; today we learn what to do with them at scale.
1 — Bridge from Week 3
| Tool | What it does | Output |
|---|---|---|
| LightGBM + SHAP | Predicts churn; explains which features drive the prediction | Feature importance rankings (correlation-based) |
| LinearDML (EconML) | Estimates the causal effect of one action on one outcome, controlling for confounders | ATE, ATT, and individual CATE values for each customer |
DML requires treated and untreated customers observed at the same time. But what if the policy applied to everyone in a region simultaneously? And DML gives us 12,000 individual CATE numbers — how do we turn those into an operational targeting decision?
Week 4 answers both questions.
1 — Bridge from Week 3
All three tools are answering the same underlying question from different angles: what is the best action to take, for whom, and how do we know?
1 — Bridge from Week 3
Six months ago TeleConnect launched a new pricing structure in Western Australia — the first and only state where it was applied. Monthly charges were restructured, and a loyalty discount was introduced for customers with more than 12 months of tenure.
TeleConnect's leadership now wants answers to three questions:
Question 1 → Synthetic Control Model
Question 2 → Policy Tree (CATE-based)
Question 3 → Counterfactual Analysis
Section 2
Counterfactual Foundations
Before we build any model, we need to understand the philosophical problem every causal tool is trying to solve.
2 — Counterfactual Foundations
To know whether an action caused an outcome, you need to compare what happened with the action against what would have happened without it — but you can only observe one of those two worlds.
Figure 1. Every customer exists in only one of the two paths. The other is forever unobservable.
This is called the Fundamental Problem of Causal Inference (Holland, 1986). All of causal statistics — DML, SCM, randomised experiments — are different strategies for estimating the unobservable path as precisely as possible.
2 — Counterfactual Foundations
A counterfactual is the outcome that would have occurred in an alternative version of events that did not actually happen. It answers the question: "What if things had been different?"
These feel natural to ask. They are almost impossible to answer rigorously without the right tools.
A doctor prescribes chemotherapy. The patient recovers. Did chemotherapy cause the recovery — or would the patient have recovered anyway? The counterfactual world (no chemotherapy) is unobservable. All clinical trials, all causal AI, and all the tools we study this week are attempting to rigorously construct this unobservable comparison.
2 — Counterfactual Foundations
At any moment in time, each customer, state, or business unit exists in exactly one state: treated or untreated. Once WA receives the pricing change, the "WA without the pricing change" no longer exists to observe. Time cannot be rewound.
This is the "missing data" problem that all causal tools solve through estimation.
The simplest (and most dangerous) approach is to compare WA's churn rate before and after the pricing change. But this conflates the pricing effect with everything else that changed during that period: seasonality, competitor moves, economic conditions, service upgrades. A good counterfactual must control for all of these.
2 — Counterfactual Foundations
| Tool | Counterfactual question it answers | Level of analysis | When to use |
|---|---|---|---|
| Synthetic Control (SCM) | "What would WA's churn rate have been if the pricing change had never happened?" | Aggregate: region, market, or time period | Whole-of-region policies with no concurrent control group |
| DML + CausalForest (Week 3) | "What would Customer #4821's churn probability be if we changed their contract type?" | Individual: each customer's causal response | Observational data with treated and untreated customers observed simultaneously |
| Policy Tree + Policy Value | "What would total churn be under a different targeting rule than the one we used?" | Population: the entire customer base under a different decision | Translating CATE estimates into operational targeting decisions |
These tools are complementary, not competing. A complete analysis uses all three: SCM to validate the overall policy, CATE to personalise the response, and policy trees to operationalise the recommendation.
2 — Counterfactual Foundations
Knowledge Checkpoint 1Section 3
Synthetic Control Model
How do you evaluate a policy when the entire region was treated — and there is no untreated group to compare against?
3 — Synthetic Control Model
TeleConnect changed pricing in WA only. There are no WA customers who did not receive the change. The counterfactual — "WA without the pricing change" — has no direct empirical counterpart.
DML needs a treated group and a control group observed at the same point in time. In WA, everyone is treated. There is no concurrent control individual to form the residual comparison.
TeleConnect cannot randomly assign some WA customers to a "no pricing change" condition — the policy was a commercial decision applied uniformly. The experiment was never designed; it just happened.
Use data from other states that did not receive the pricing change — NSW, VIC, QLD, SA — to construct a synthetic version of WA that mimics what WA would have looked like in the absence of the intervention. The gap between real WA and its synthetic twin, measured after the policy launch, is the causal effect.
3 — Synthetic Control Model
What it does: Compare WA's churn rate in month 1–6 (pre) vs. month 7–12 (post).
Why it fails: The drop may be partly or entirely driven by factors unrelated to the pricing change — seasonal patterns, a competitor's service outage, national economic conditions, or TeleConnect's own service improvements that coincided with the launch.
What it does: Compare WA's churn in the post-period to NSW's churn in the post-period.
Why it fails: NSW may differ from WA in customer demographics, competitive landscape, urbanisation, income levels, or TeleConnect's network quality. These pre-existing differences will contaminate the estimate.
A valid comparison unit must have been on the same trajectory as WA before the policy. No single state is a perfect match. But a weighted blend of multiple states might be — and SCM finds those weights systematically.
3 — Synthetic Control Model
You want to know whether your new diet is working. You cannot un-eat the food. But you could find several friends who share your starting weight, metabolism, activity level, and eating habits — and combine their outcomes into a composite "you-without-the-diet." The difference between your actual weight and your synthetic twin's weight after six weeks is your diet's causal effect.
Figure 2. SCM constructs a synthetic WA from a weighted blend of untreated donor states, chosen to match WA's pre-policy behaviour.
3 — Synthetic Control Model
SCM finds the weights $w_{\text{NSW}}, w_{\text{VIC}}, w_{\text{QLD}}, \ldots$ that minimise the difference between real WA and the synthetic WA during the pre-treatment period, subject to the weights being non-negative and summing to 1.
NSW gets a weight of 0.45 because its pre-policy churn trend looks most like WA's. SA gets a weight of 0.05 because it is not very similar. A state that looks nothing like WA will get a weight of zero.
The weights are not chosen by hand — they are the solution to a constrained least-squares problem.
Before trusting the post-policy gap, we verify that synthetic WA tracks real WA closely during the pre-policy period. If the lines do not match in the pre-period, the synthetic control is not credible — and the post-period gap should not be interpreted causally.
3 — Synthetic Control Model
Figure 3. During the pre-policy period (months 1–6), synthetic WA matches real WA closely. This validates the synthetic control before we examine the post-period.
The lines are nearly identical in the pre-period. This is the key validity check: if the synthetic WA could not track real WA before the policy, we would have no reason to trust it as a counterfactual after the policy.
3 — Synthetic Control Model
Figure 4. After the policy launches, real WA churn falls while synthetic WA continues on its natural trajectory. The gap at month 12 is approximately 17 percentage points — the estimated causal effect.
3 — Synthetic Control Model
| Metric | Value | Interpretation |
|---|---|---|
| Real WA churn at M12 | 22% | The actual observed outcome after the pricing change |
| Synthetic WA churn at M12 | 39% | Estimated counterfactual — what WA would look like without the change |
| SCM causal effect | −17 percentage points | The pricing change reduced monthly churn by 17pp on average |
| Pre-period fit (RMSPE) | 0.4pp | Synthetic WA tracked real WA within 0.4pp on average in the pre-period — excellent fit |
If WA has 80,000 TeleConnect customers, a 17pp reduction in monthly churn rate represents approximately 13,600 customers retained per month who would otherwise have left. At an average customer lifetime value of $400, that is $5.4 million in preserved revenue per month.
The pre-period RMSPE of 0.4pp means the synthetic control was tracking WA within less than half a percentage point before the policy. This tight pre-period fit gives us high confidence that the 17pp post-period gap is driven by the pricing change, not by pre-existing differences.
3 — Synthetic Control Model
| Assumption | Plain language | What happens if violated |
|---|---|---|
| Common trends (pre-period fit) | The synthetic control must track the treated unit closely before the intervention | The post-period gap is not credibly attributable to the treatment |
| No spillover | The WA pricing change must not affect customer behaviour in NSW, VIC, or QLD | Donor states are contaminated — the synthetic twin is biased |
| No anticipation | Customers must not have changed behaviour before the policy launched in response to knowing it was coming | The pre-period baseline is contaminated and the effect is understated |
| Stable donor pool | No major concurrent shocks hit the donor states (NSW, VIC, QLD) that did not also hit WA | The synthetic WA's post-period trajectory no longer represents WA's counterfactual |
If WA customers tell NSW family members about the new pricing and those NSW customers switch providers, the donor pool is contaminated. TeleConnect should check whether NSW churn changed immediately after the WA launch as a diagnostic.
3 — Synthetic Control Model
Knowledge Checkpoint 2Section 4
Policy Trees
SCM told us the pricing change worked overall. Now: who responded most — and how do we build a targeting rule a manager can actually follow?
4 — Policy Trees
EconML's CausalForest estimated a personalised treatment effect (CATE) for each of TeleConnect's 12,000 WA customers — the expected reduction in churn probability if each individual receives the proactive retention call.
A call-centre manager cannot process 12,000 individual scores. They need a rule: "Call customers who look like X. Do not call customers who look like Y."
Even if they could, a rule with thousands of conditions is not auditable, explainable, or robust to new customers.
A small number of interpretable rules — ideally expressible as a short decision tree — that:
A policy tree is a decision tree trained not to predict an outcome but to prescribe an action — specifically, the action that maximises expected welfare given each customer's CATE estimate.
4 — Policy Trees
A policy tree is a decision tree that partitions the customer population into groups based on observable features, and prescribes the action that produces the highest expected treatment benefit for each group — using CATE estimates as the welfare signal for each split.
A CATE forest is a GPS that generates a unique, personalised route for every car on every road simultaneously. A policy tree is the highway sign that collapses all of that precision into: "Trucks over 4.5 tonnes — left lane. Everyone else — right lane."
It sacrifices some individual precision in exchange for being operationally deployable at scale. The trade-off is intentional and measurable (through policy value).
4 — Policy Trees
Trained on: Historical labels — did this customer churn (1) or not (0)?
Splits to maximise: Prediction accuracy (Gini impurity or cross-entropy)
Output: "This customer has a 74% probability of churning"
Question answered: Who will churn?
Key limitation: A customer with 90% churn probability may not respond to any intervention — they have already decided to leave.
Trained on: CATE estimates — how much does this customer's churn probability change under treatment?
Splits to maximise: Total expected welfare gain (sum of CATEs in treated leaves)
Output: "Call this customer — the intervention will reduce their churn probability by 14pp"
Question answered: Who will respond to the intervention?
Key strength: Correctly deprioritises high-risk customers who will not respond, saving intervention budget for those where it matters.
Calling the highest-churn-risk customers. These are often customers who have already made their decision to leave. The right target is the customer with the largest treatment response — which may be a medium-risk customer who is undecided.
4 — Policy Trees
Stage 1 — Estimate CATEs: Use CausalForest or LinearDML to compute an individualised CATE for each customer. These become the welfare signal that guides tree splitting.
Stage 2 — Build the policy tree: Fit EconML's PolicyTree on customer features (X) and CATE estimates. Each split is chosen to maximise the weighted sum of CATEs in the "treat" leaf.
For every candidate split (e.g., service_calls > 4), the algorithm asks: "If I follow this rule, how much total churn reduction do I gain across all customers in both resulting groups?" The split that maximises this gain is chosen.
This is fundamentally different from a prediction tree, which splits to minimise prediction error.
Deeper trees give more precision but sacrifice interpretability. A depth-2 tree has at most 4 leaf nodes — 4 customer groups. A depth-3 tree has at most 8.
In practice, depth 2–3 offers the best balance between welfare maximisation and operational simplicity. Most call-centre managers can follow a 4-rule decision tree.
EconML's PolicyTree uses a variant of greedy recursive partitioning with honest splitting — the same sample is not used for both the split selection and the welfare estimation, reducing overfitting.
4 — Policy Trees
Figure 5. TeleConnect policy tree (depth 2). Three groups are recommended for the proactive call; one group is deprioritised. The recommended action for each leaf is determined by its average CATE.
4 — Policy Trees
Policy value is the expected total welfare gain from applying the policy tree's targeting rule to the full customer population — measured as the average CATE across all customers who are recommended for treatment under the rule.
| Targeting Strategy | Customers called | Avg CATE in treated group | Total churn reduction (pp × n) |
|---|---|---|---|
| Policy tree rule | 3,542 (Leaves 1+2+3) | −11.1pp | −393 churns prevented |
| Call everyone (blanket) | 5,800 | −6.7pp (average CATE) | −389 churns, higher cost |
| Call top 20% by churn score | 1,160 | −3.2pp (low CATE — high-risk but unresponsive) | −37 churns prevented |
| No action | 0 | — | 0 |
The policy tree rule prevents almost as many churns as blanket calling (393 vs. 389) while calling 39% fewer customers — saving roughly 2,258 call-centre hours. Calling the top 20% by churn score is dramatically less effective because those customers have high churn risk but low treatment response.
4 — Policy Trees
Figure 6. Average CATE by segment. Green-region bars are recommended for the proactive call; grey bars represent customers where the intervention has minimal expected benefit.
The segment with the highest CATE (−14pp) is customers with more than 4 service calls on a month-to-month plan. These customers are frustrated enough to consider leaving but have not yet made a final decision — they are the most responsive to intervention.
4 — Policy Trees
The policy tree recommends NOT calling 2,258 customers — even though some of them have a high predicted churn probability. This surprises managers who are used to churn prediction models.
Some customers have already made their decision. They have contacted support multiple times, compared competitors' plans, and made up their mind. A proactive call at this stage is unlikely to change the outcome — and may accelerate it by drawing attention to their dissatisfaction.
CATE: near zero. Churn probability: very high. Action: do not call.
Happy, long-tenured customers on annual plans with low service call frequency are not at risk. Calling them has low expected benefit — and risks surprising them into thinking something is wrong.
CATE: near zero. Churn probability: low. Action: do not call.
The highest-CATE customers are those who are dissatisfied (evidenced by service calls) but have not yet committed to leaving (often still on month-to-month, not yet contacted a competitor). A timely, empathetic call at this moment has the highest probability of changing the outcome.
4 — Policy Trees
Knowledge Checkpoint 3Section 5
What If? Counterfactual Analysis
We now have SCM and policy tree estimates. Counterfactual analysis asks: what would have happened under a different decision — and by how much?
5 — Counterfactual Analysis
An individual counterfactual asks: "What would this specific customer's outcome have been under a different action?" It uses the CATE estimate for that customer as the bridge between the factual and counterfactual worlds.
| Scenario | Action | Estimated churn probability |
|---|---|---|
| Factual (what happened) | Proactive call received | 22% (actual outcome) |
| Counterfactual 1 | No proactive call | 22% + 14pp = 36% (estimated) |
| Counterfactual 2 | $20 bill credit instead of call | 22% − 4pp = 18% (estimated) |
| Counterfactual 3 | Both call and bill credit | 22% − 4pp − 6pp = 12% (estimated) |
Individual counterfactuals are estimated, not observed. They rely on the CATE model's accuracy. The further the counterfactual action is from anything in the training data, the less reliable the estimate.
5 — Counterfactual Analysis
A policy counterfactual asks: "What would the total outcome have been across the entire population if we had used a different targeting rule?" It uses the policy value framework to compare alternative decisions.
| Policy scenario | Customers contacted | Total churns prevented | Cost (calls × $45/call) | Cost per churn prevented |
|---|---|---|---|---|
| Policy tree rule (actual) | 3,542 | 393 | $159,390 | $406 |
| Counterfactual: Call everyone | 5,800 | 389 | $261,000 | $671 |
| Counterfactual: Top 20% churn risk | 1,160 | 37 | $52,200 | $1,411 |
| Counterfactual: Top 30% CATE | 1,740 | 312 | $78,300 | $251 |
| Counterfactual: No action | 0 | 0 | $0 | — |
If TeleConnect had used churn risk scores (the conventional approach) rather than CATE-based policy trees, they would have prevented only 37 churns at a cost of $1,411 per churn prevented — versus 393 churns at $406 each with the policy tree. The CATE approach is 10x more effective at identifying responsive customers.
5 — Counterfactual Analysis
TeleConnect launched its retention campaign in month 7 (simultaneously with the pricing change). What if they had started two months earlier — in month 5 — reaching customers before their frustration peaked?
Figure 7. Counterfactual comparison of retention strategies by total churns prevented. Launching 2 months earlier recovers ~19% more churns than the actual policy, because customers are contacted before frustration becomes a firm decision.
5 — Counterfactual Analysis
Every targeting decision implies a counterfactual: "What if we had used a different rule?" Quantifying these counterfactuals before the campaign launches (using CATE estimates and policy value) allows TeleConnect to choose the best strategy — rather than evaluating it only in retrospect.
Use CATE estimates and policy tree simulations to compare alternative targeting rules. Choose the rule with the highest policy value per dollar spent.
Proactive — optimal decisions without waiting for outcomes.
Use SCM to measure what actually happened vs. the synthetic counterfactual. Compare with the pre-campaign policy value prediction. Validate or update the model.
Retrospective — model validation and learning loop.
As new customers are added, re-estimate CATEs and re-run the policy tree. Update the call list weekly to reflect the most current risk and responsiveness estimates.
Continuous — the model becomes the operational system.
Every business decision is a bet against an implicit counterfactual. Making that counterfactual explicit — and quantifying it with the tools we have studied — is the difference between data-informed decision-making and genuine causal intelligence.
5 — Counterfactual Analysis
Knowledge Checkpoint 4Section 6
Putting It Together
Three tools. One business event. A complete causal story.
6 — Synthesis
| Business question | Tool | Answer for TeleConnect WA |
|---|---|---|
| "Did the pricing change actually reduce churn — or would churn have fallen anyway?" | SCM | Yes — real WA churn fell 17pp below its synthetic twin by month 12, with an excellent pre-period fit (RMSPE = 0.4pp) |
| "Who responded most to the retention call? Who should we target next quarter?" | Policy Tree | Call customers with service_calls > 4 (especially on monthly plans). Expected CATE: −14pp. Deprioritise low-call, low-charge customers (CATE: −1pp) |
| "What would have happened if we had launched the campaign 2 months earlier, or targeted only the top 30% by CATE?" | Counterfactual Analysis | Earlier launch: +19% more churns prevented. Top 30% CATE: 79% of the benefit at 49% of the cost. Never use churn risk scores alone — they are 10x less efficient than CATE targeting. |
Past: SCM validates that the policy worked overall. Present: Policy tree operationalises who to contact next. Future: Counterfactual analysis tests alternative strategies before they are executed, closing the loop between data science and business planning.
6 — Synthesis
Prediction tells you what will happen.
Causation tells you how to change it.
Counterfactual analysis tells you what the change is worth — and whether there is a better one.
| Part | What you will do | Tool |
|---|---|---|
| Demo 1 | Fit a Synthetic Control to the TeleConnect WA dataset; visualise pre-period fit and the post-policy gap | SparseSC or manual weighted regression |
| Demo 2 | Build a policy tree from the Week 3 CATE estimates; compare leaf-level CATEs and compute policy value | EconML PolicyTree |
| Demo 3 | Run three "what if?" counterfactual scenarios and compare total churn prevented under each | Policy value simulation |
| Exercises | Repeat the full workflow on a new dataset (MobTel national rollout); fill in code and markdown blanks | All three tools |
DATA5000 — Week 4 | Open your Jupyter notebook to begin