A practical guide for making complex causal inference concepts accessible to students
Begin with concrete, everyday examples to build intuition before introducing formal concepts.
"Imagine you're planning a road trip and have two route options: Highway A or Highway B. You choose Highway A and arrive in 3 hours. But how do you know if that was the best choice? You can't go back in time and try Highway B."
This illustrates the fundamental problem of causal inference - we never observe both outcomes for the same unit at the same time.
"Imagine you love a restaurant's secret sauce but can't access their recipe. To recreate it, you mix different proportions of ketchup, mayo, spices, etc. until you match the taste."
That's what Synthetic Control Method does - we "recreate" California using 30% Nevada + 45% Washington + 25% Oregon to see what would have happened without a policy intervention.
Set up a demonstration with colored water in clear cups:
A counterfactual is what would have happened to a treated unit if it had not received the treatment. It's the alternative reality we never observe but need to estimate.
The treatment effect is the gap between what actually happened and what would have happened without the intervention. This difference represents the causal impact of the treatment.
Actual House
Price
Counterfactual
(No Fireplace)
The Synthetic Control Method (SCM) creates a weighted combination of control units that closely resembles the treated unit before intervention, then uses these weights to estimate what would have happened without treatment.
Adjust the weights of control states to create a synthetic California that matches the real California's tobacco consumption before the policy:
In 1988, California implemented a tobacco control program with increased taxes and anti-smoking initiatives. Looking at the data:
| Year | California | Nevada | Oregon | Washington |
|---|---|---|---|---|
| 1985 | 120 | 140 | 125 | 115 |
| 1990 | 100 | 135 | 120 | 110 |
| 1995 | 80 | 130 | 115 | 105 |
| 2000 | 60 | 125 | 110 | 100 |
The challenge: Which state is the best comparison? Or should we use a combination?
| Year | State A (Treated) |
State B | State C | State D | State E |
|---|---|---|---|---|---|
| 2018 | 100 | 90 | 110 | 105 | 95 |
| 2019 | 105 | 92 | 114 | 108 | 98 |
| 2020 | 115 | 94 | 117 | 110 | 100 |
| 2021 | 130 | 97 | 120 | 112 | 103 |
| Technical Term | Student-Friendly Language | Visual/Data Example |
|---|---|---|
| Counterfactual | "What would have happened otherwise" | The dashed line showing predicted cigarette sales without the policy |
| Synthetic Control Weights | "Recipe proportions" or "Mixing ingredients" | Nevada: 30%, Oregon: 25%, Washington: 45% |
| Pre-treatment fit | "How well our copy matches before the change" | How closely synthetic California tracks real California before 1988 |
| Treatment effect | "The difference our change made" | 80 fewer cigarette packs per capita by 2000 |
| Donor pool | "Available comparison ingredients" | The states we can use to build our synthetic version (Nevada, Oregon, etc.) |
| Covariates | "Important matching characteristics" | Demographics, economy, smoking regulations before treatment |
Show students this SCM graph and ask:
Questions:
Provide this mini-dataset and have students:
| Year | Treated | Control 1 | Control 2 | Control 3 |
|---|---|---|---|---|
| 2018 | 10 | 12 | 8 | 9 |
| 2019 | 11 | 13 | 9 | 10 |
| 2020* | 15 | 14 | 10 | 11 |
| *Treatment implemented in 2020 | ||||
Question: The chart below shows a counterfactual analysis of houses with and without fireplaces:
Based on this data visualization, what is the estimated causal effect of adding a fireplace to a house?
| Common Challenge | Data-Driven Teaching Solution |
|---|---|
| Confusing counterfactuals with predictions | Show side-by-side visuals: counterfactual (alternative present) vs. forecast (future prediction) |
| Struggling with weight optimization | Use interactive tools where students can adjust weights and see pre-treatment fit improve |
| Difficulty identifying good control units | Show pre-treatment trend charts for multiple potential controls and discuss similarities/differences |
| Overconfidence in results | Use placebo tests where students apply SCM to units that didn't receive treatment |
| Poor understanding of when to use SCM | Present datasets with varying characteristics and ask which would be suited for SCM vs. other methods |