Week 3: Causal Machine Learning

DATA5000 - Artificial Intelligence Programming in Business Analytics

Understanding cause and effect in artificial intelligence

What We'll Cover Today

Core Concepts

The difference between correlation and causation
Understanding confounders
Four types of treatment effects
Why causal analysis matters for business decisions

Practical Tools

SHAP for explaining predictions
EconML for finding causes
Real business applications

The Four Types of Analytics

Type	Question	Example
Descriptive	What happened?	"Our sales dropped 20% last month"
Diagnostic	Why did it happen?	"Sales dropped because our competitor launched a sale"
Predictive	What will happen?	"Sales will likely drop another 15% next month"
Prescriptive	What should we do?	"Offer free shipping and extend hours to recover 80% of lost sales"

Today we focus on moving from prediction to prescription using causal analysis

Prescriptive Analytics in Action

Netflix: From Prediction to Recommendation

Predictive Analytics says:
"This user will likely watch more shows"

Prescriptive Analytics says:
"Show this user Season 2 of Stranger Things NOW because that specific recommendation will cause them to stay subscribed"

The Key Difference

Prediction tells you WHAT will happen

Prescription tells you HOW to change it

Business value comes from changing outcomes, not just predicting them

Understanding Correlation vs. Causation

The Ice Cream Example

Observation: Ice cream sales and drowning deaths both increase in summer

Wrong Conclusion: "Ice cream causes drowning! Ban ice cream!"

Right Conclusion: Hot weather causes BOTH:

More people buy ice cream
More people go swimming, leading to more drowning incidents

Key Lesson

Just because two things happen together doesn't mean one causes the other. There might be a third factor causing both.

Correlation vs. Causation: Business Example

Coffee Shop Scenario

Observation: Customers who buy muffins also buy coffee

Correlation: Muffin buyers tend to buy coffee

Causation Question: Does offering a muffin discount CAUSE more coffee sales?

Better Question: What if we offered a "coffee + muffin" combo? Would that CAUSE higher total revenue?

Why This Matters

If you just see correlation, you might discount muffins expecting coffee sales to rise. But if morning customers naturally buy both, the discount just loses you money.

Example Data: Age as a Confounder

Look at this gym membership data. Notice the pattern:

Person	Age	Exercise (hrs/week)	Health Score (0-100)
John	25	5.0	85
Mary	28	4.5	82
Alex	32	4.0	80
Lisa	45	3.0	72
Tom	48	2.5	70
Sarah	52	2.0	68
Bob	61	1.5	62
Carol	65	1.0	58
Dan	68	0.5	55

What Do You Notice?

🟢 Young people (20s-30s): Exercise MORE (4-5 hrs) → Health Score HIGHER (80-85)

🟡 Middle age (40s-50s): Exercise LESS (2-3 hrs) → Health Score MEDIUM (68-72)

🔴 Older people (60s+): Exercise LEAST (0.5-1.5 hrs) → Health Score LOWER (55-62)

The Question: Does exercise cause better health? Or does AGE affect BOTH exercise habits AND health?

Confounders Explained Simply

Definition

A confounder is a hidden influencer that affects both your action AND your result, making you think one causes the other when it doesn't.

CONFOUNDER
(Customer Age)
↙ ↘
Action Result
(Exercise Habits) → (Health Score)

Without Controlling for Age

Young people exercise more AND are healthier

Conclusion: "Exercise causes better health!" (Partially true, but age is hiding part of the story)

With Age Controlled

Compare 30-year-olds who exercise vs. don't exercise

Compare 60-year-olds who exercise vs. don't exercise

Now we see the TRUE effect of exercise at each age level

Real Business Confounders

Business Scenario	Apparent Relationship	Hidden Confounder
Premium members buy more	Premium status → Higher spending	High income causes BOTH premium membership AND more spending
Email opens lead to purchases	Opening emails → Buying	Brand loyalty causes BOTH email engagement AND purchases
Training increases productivity	Training → Performance	Motivated employees seek training AND perform better
Ads drive sales	Ad clicks → Purchases	Purchase intent causes BOTH ad clicking AND buying

Exercise 1: Spot the Confounder

Scenario

An e-commerce company finds that customers who view product reviews spend 40% more.

Question

Should they force ALL customers to view reviews?

Think About

What type of customers naturally read reviews?
Could there be a hidden factor (confounder)?
Would showing reviews to uninterested customers have the same effect?

Discuss with your group for 3 minutes

Exercise 1: Answer

The Hidden Confounder: Customer Engagement Level

What's really happening:

Engaged, interested customers READ reviews
Engaged, interested customers ALSO spend more
The engagement level is the hidden factor affecting both

Conclusion: Simply showing reviews to uninterested customers won't cause the same spending increase. They're not engaged enough to care.

The Right Approach

Identify what makes customers engaged, then work on increasing engagement rather than just forcing review views.

What Are Treatment Effects?

What is a "Treatment"?

Any action, policy, or intervention you might take:

Sending a discount email
Offering free shipping
Hiring more staff
Launching a new product feature

What is a "Treatment Effect"?

The actual impact that action has on your outcome (sales, satisfaction, retention, etc.)

Why "Treatment"?

The term comes from medical research (does this treatment cure the disease?), but in business:

Treatment = Your business action
Effect = What it actually causes to happen

Critical Question: Where Does Treatment Data Come From?

The Challenge

You're asking the right question! To estimate treatment effects, you need to observe BOTH:

Treated group: People who received the action/intervention
Control group: People who did NOT receive it

Three Real-World Scenarios:

Scenario 1: Historical Data (Most Common)

The treatment already happened naturally

✅ Some customers signed contracts (treated)
✅ Others stayed month-to-month (control)
✅ You observe both groups in your data
⚠️ Problem: Self-selection bias

Scenario 2: A/B Test (Gold Standard)

You randomly assign treatment

✅ Randomly show free shipping to 50%
✅ Other 50% see regular shipping
✅ No self-selection bias
✅ Clean causal estimates

Scenario 3: Natural Experiment (Policy Changes)

External event creates treatment/control groups

Example: Government policy applies to one region but not another
Compare regions before and after policy change
Useful when experiments are impossible or unethical

EconML's Power

EconML works with Scenario 1 (observational data) by controlling for confounders statistically, letting you estimate causal effects even without running experiments.

The Four Types of Treatment Effects

1. Average Treatment Effect (ATE)

Question: "Does it work overall?"

Measures: Impact across EVERYONE

Example: "Free shipping increases average order value by $12"

2. Heterogeneous Treatment Effect (HTE)

Question: "Does it work differently for different groups?"

Measures: How impact varies by group

Example: "Free shipping increases orders by 30% for students but decreases orders by 5% for seniors"

The Four Types of Treatment Effects (Continued)

3. Local Average Treatment Effect (LATE)

Question: "Does it work for those actually affected?"

Measures: Impact only on those who changed behavior

Example: "For customers who used the free shipping offer, orders increased by $25"

4. Conditional Average Treatment Effect (CATE)

Question: "Does it work for people like YOU?"

Measures: Impact for specific individual characteristics

Example: "For 25-year-old female customers in Sydney who shop on weekends, free shipping increases orders by $18"

Exercise 2: Which Treatment Effect?

Match the business question to the right treatment effect type

"Should we roll out free shipping to all customers?"
"Does free shipping work better for our VIP customers or regular customers?"
"For customers who actually used the free shipping offer, how much did their spending increase?"
"For a 35-year-old customer in Melbourne who shops monthly, what impact would free shipping have?"

Options

ATE (Average Treatment Effect)
HTE (Heterogeneous Treatment Effect)
LATE (Local Average Treatment Effect)
CATE (Conditional Average Treatment Effect)

Exercise 2: Answers

Correct Matches

"Should we roll out free shipping to all customers?"
→ ATE (Average Treatment Effect)
You need to know the overall average impact
"Does free shipping work better for our VIP customers or regular customers?"
→ HTE (Heterogeneous Treatment Effect)
You're comparing effects across different groups
"For customers who actually used the free shipping offer, how much did their spending increase?"
→ LATE (Local Average Treatment Effect)
You're measuring the effect on those who were actually affected
"For a 35-year-old customer in Melbourne who shops monthly, what impact would free shipping have?"
→ CATE (Conditional Average Treatment Effect)
You're getting personalized predictions based on specific characteristics

CATE Example: Personalization in Action

Business Scenario: Streaming Service Recommendations

Problem: Should we recommend action movies or comedies to a new subscriber?

For users who are:
- Male, Age 18-25, watches on weekends, prefers mobile
→ Action movies increase viewing time by 45 minutes

For users who are:
- Female, Age 18-25, watches on weekends, prefers mobile  
→ Comedies increase viewing time by 38 minutes

Business Value

Instead of one recommendation for everyone (ATE), personalize based on characteristics (CATE)

Result: Higher engagement, lower churn, more subscription renewals

The Boston Housing Dataset: Why This Matters

The Historical Context

1970s Boston housing data
Shows how correlation can mislead
Demonstrates why causal analysis is essential for ethical AI

The Problem

Neighborhoods with more Black residents had lower home prices.

Correlation-Based Model Would Say:
"Ethnicity predicts lower home prices"

Causal Analysis Reveals

Discriminatory policies forced Black residents into areas with worse infrastructure
Poor infrastructure (pollution, crime, smaller houses) CAUSED lower prices
Ethnicity was correlated but NOT causal

Boston Housing Dataset: Business Lesson

Why This Matters for Business Analytics

Predictive models can perpetuate bias and historical injustice

Causal models help us understand what we can actually change

Predictive Approach

"Use ethnicity as a feature because it predicts prices well"

Result: Perpetuates discrimination

Causal Approach

"Identify what actually causes price differences: infrastructure, pollution, crime"

Result: Focus on factors we can actually change

Today's Exercise

We'll use this dataset to practice identifying true causal factors while avoiding bias

Exercise 3: Telecom Churn Scenario

You're analyzing why customers leave your telecom company

Data shows:

Customers with more service calls churn more
Customers with month-to-month contracts churn more
Customers with higher bills churn more

Questions

Are these correlations or causations?
What might be confounders?
What treatment could you test?
Which type of treatment effect would help most?

Discuss in your group for 5 minutes

Exercise 3: Possible Answers

1. Correlation or Causation?

Service calls: Likely causal (problems cause churn)
Month-to-month contracts: Could be correlation (uncertain customers choose flexible contracts AND are more likely to leave)
Higher bills: Could be both (high bills cause churn, but also price-insensitive customers pay more and churn less)

2. Possible Confounders

Customer satisfaction affects both service calls AND churn
Financial stability affects both contract choice AND churn
Competitive offerings affect both price sensitivity AND churn

Exercise 3: Possible Answers (Continued)

3. Possible Treatments

Improve service quality (reduce need for calls)
Offer contract incentives (discounts for longer commitments)
Provide personalized pricing (based on usage patterns)
Proactive support (reach out before problems occur)

4. Best Treatment Effect Type

CATE (Conditional Average Treatment Effect): Personalize interventions based on customer profile

Example: "For high-value customers with service issues, offering a dedicated support line reduces churn by 25%"

HTE (Heterogeneous Treatment Effect): Different strategies for different customer segments

Example: "Contract incentives work for price-sensitive customers but not for quality-focused customers"

From Prediction to Action: The Key Difference

Predictive ML Says

"Customers who call service 3+ times will likely churn"

You can predict, but what do you DO about it?

Causal ML Says

"Reducing service calls by 50% through proactive support will decrease churn by 15%"

You have an actionable strategy

Why This Matters

Prediction tells you WHAT will happen
Causation tells you HOW to change it
Business value comes from changing outcomes, not just predicting them

SHAP vs. EconML: Two Tools, Two Jobs

SHAP (Explainable AI)

Job: Explain predictions

Answers: "Which features contributed most to this prediction?"

Type: Correlation-based

Example: "This customer will churn because they have a high bill, many service calls, and month-to-month contract"

EconML (Causal ML)

Job: Identify causes

Answers: "Which features, if changed, will change the outcome?"

Type: Causation-based

Example: "Reducing this customer's bill by $10 will decrease their churn probability by 12%"

Using Them Together

Use SHAP to understand your predictive model
Use EconML to find what you can actually change
Make data-driven decisions with confidence

Today's Hands-On Exercise Structure

Step 1: Linear Regression (15 minutes)

Build intuition with simple model
Understand basic relationships

Step 2: Predictive ML + SHAP (25 minutes)

Train a Random Forest model
Use SHAP to find strongest predictors
Interpret correlation-based insights

Step 3: Causal ML + EconML (25 minutes)

Apply Double Machine Learning
Identify true causal factors
Compare with predictive insights

Step 4: Business Recommendations (10 minutes)

What actions would you take?
What would you change first?
How would you measure success?

SHAP: Understanding Model Predictions

What is SHAP?

SHAP (SHapley Additive exPlanations) explains ML predictions by showing how much each feature contributed to a specific prediction.

Based on game theory: distributes "credit" for a prediction fairly among all features.

✅ What SHAP Tells You

Which features matter most
Positive or negative impact
Individual prediction breakdown
Correlation patterns

❌ What SHAP Does NOT Tell You

Causal relationships
What happens if you change X
Which actions to take
Causation

🏢 Business Example: Customer Churn

SHAP says: "Customer service calls strongly predict churn" (correlation)

Does NOT mean: "Reducing service calls will reduce churn" (causation)

Why? Service calls might be a symptom of underlying product issues (confounder)

Three Essential SHAP Visualizations

1️⃣ Feature Importance Bar Chart

What it shows: Overall ranking of features by average absolute impact

Business use: "Which 5 factors matter most for customer churn?"

Contract Status     ████████████████ 0.42
Service Calls       ███████████      0.31
Monthly Charges     █████████        0.25
Account Age         ██████           0.18
Data Usage          ████             0.12

Higher bar = stronger correlation with prediction

2️⃣ Summary Plot (Beeswarm)

What it shows: Distribution of SHAP values across all data points

Colors: Red = high feature value, Blue = low feature value

Use case: "Do high service calls increase or decrease churn probability?"

3️⃣ Waterfall Plot

What it shows: Step-by-step breakdown of ONE prediction

Use case: "Why did the model predict THIS customer will churn?"

Shows: Base value → Feature impacts → Final prediction

SHAP Waterfall: Individual Prediction Breakdown

🎯 Example: Predicting Churn for Customer #247

Final Prediction: 78% churn probability (High Risk!)

How the Model Got There

Starting Point (Base Value)	32% (average churn)
+ Month-to-month contract	+25%
+ 8 service calls (very high)	+18%
- Has device protection	-5%
+ High monthly charges ($89)	+8%
Final Prediction	78% churn risk

⚠️ Critical Reminder: This is CORRELATION, not CAUSATION

These features predict churn, but we cannot conclude that changing them will reduce churn.

For causal effects, we need EconML.

SHAP vs EconML: When to Use Each

Aspect	SHAP (Correlation)	EconML (Causation)
Question	"Which features predict the outcome?"	"Which features CAUSE the outcome?"
Output	Feature importance scores	Treatment effect estimates
Handles Confounders?	❌ No - shows all correlations	✅ Yes - removes confounder bias
Business Use	Understanding patterns, debugging models	Making decisions, taking actions
Example Insight	"Service calls explain 31% of churn variance"	"Reducing service calls by 1 causes 5% churn reduction"

🎯 The Ideal Workflow

Step 1: Build predictive ML model (LightGBM, XGBoost, etc.)
Step 2: Use SHAP to understand correlations and generate hypotheses
Step 3: Use EconML to test causal hypotheses and estimate treatment effects
Step 4: Make business decisions based on CAUSAL effects, not correlations
Step 5: Use CATE to personalize actions for different customer segments

⛔ The Trap: Using SHAP for Decisions

Wrong: "SHAP says service calls are important → reduce service calls"

Why wrong: Calls might just correlate with product issues. Reducing calls without fixing root causes won't help.

Right: Use EconML to find the causal effect of improving service quality

Summary & Today's Hands-On Exercise

🎓 Key Concepts Mastered Today

Confounders bias treatment effect estimates if not controlled
Treatment effects (ATE, HTE, CATE, LATE) answer different business questions
SHAP shows correlation - which features predict outcomes
EconML shows causation - which features cause outcomes
Use causal methods (not SHAP alone) when making business decisions

💻 Today's Jupyter Notebook Exercise

You'll apply SHAP and EconML to the TeleConnect churn dataset and see:

Which features are strong predictors (SHAP)
Which features have true causal effects (EconML)
Why correlation ≠ causation with real visualizations
How to make actionable business recommendations

🎯 Preparing for Assessment 1 (Due Week 5)

Your assessment will require:

Applying both SHAP and EconML to a business dataset
Distinguishing correlation from causation
Identifying confounders and treatment effects
Making actionable, causal-based business recommendations

💡 Remember the Core Principle

Prediction tells you what will happen.

Causation tells you how to change it.

Real business value comes from taking the RIGHT actions, not just predicting outcomes.