DATA5000  ·  Kaplan Business School

Week 2: Introduction to
Predictive Analytics

Machine Learning for Business Intelligence
Artificial Intelligence Programming in Business Analytics
Section 1

Week 1 Recap

Before moving forward, we revisit the core ideas from last week — the distinction between AI and machine learning, the four levels of analytics, and why predictive analytics matters for business today.

1
Section 1.1 — Week 1 Recap

Artificial Intelligence and Machine Learning

Artificial Intelligence Expert Systems · Robotics · NLP Machine Learning Supervised · Unsupervised ML is a subset of AI
Artificial Intelligence

The broad goal of making computers behave intelligently — simulating human reasoning, understanding, and decision-making. It covers many different technologies and approaches.

Machine Learning

A subset of AI. Instead of writing explicit rules, we give the computer data and let it learn patterns automatically. The model improves its predictions the more data it sees.

Key distinction: Traditional programming gives the computer rules. Machine learning gives the computer data and lets it discover the rules itself.
Section 1.2 — Week 1 Recap

The Four Levels of Business Analytics

PRESCRIPTIVE What should we do? PREDICTIVE What will happen? DIAGNOSTIC Why did it happen? DESCRIPTIVE What happened?
1. DESCRIPTIVE
Sales fell 12% last quarter. Churn rate was 8.3% this month.
2. DIAGNOSTIC
Churn was higher in customers on month-to-month contracts with 3+ complaints.
3. PREDICTIVE — This Week
Customer #4821 has a 78% probability of churning next month.
4. PRESCRIPTIVE — Later in Course
Offer Customer #4821 a 3-month discount. Expected retention gain: $1,200.
Section 1.3 — Week 1 Recap

Where Machine Learning Fits

Machine learning is not a single level of analytics — it is a tool that can be applied across all four levels, with different techniques and goals at each stage.
Level ML Role Example Technique
Descriptive Summarise and segment data K-Means clustering, PCA
Diagnostic Find patterns that explain outcomes Decision trees, SHAP analysis
Predictive — Today Forecast future outcomes from past data LightGBM, NeuralProphet
Prescriptive Recommend the best action to take Causal ML, Reinforcement Learning
Section 1.4 — Week 1 Recap

Why Predictive Analytics Matters Now

2.5 QB
Data Created Daily
Businesses now generate more data than ever. Most of it is never used for decisions.
$4.4T
Projected AI Economic Value
McKinsey estimates AI could add $4.4 trillion annually across major industries by 2030.
67%
Firms Using Predictive ML
Of Fortune 500 companies now use machine learning for forecasting and risk assessment.
More Data

Customer transactions, web clickstreams, IoT sensors, social media — historical data now exists at scale.

More Compute

Cloud platforms (AWS, GCP, Azure) allow businesses to run ML models in hours rather than months.

Accessible Tools

Python libraries like LightGBM and NeuralProphet make production-quality forecasting available to analysts, not just data scientists.

Knowledge Checkpoint — Section 1

Analytics Levels

A telecommunications company reviews its data and finds that 14% of customers cancelled their service last month. A manager then asks: "Which of those customers were most likely to leave, and can we identify similar customers before they cancel next month?" Which analytics level does this second question represent?
Section 2

Supervised Learning

The most widely used form of machine learning in business. We explore how supervised learning works, how data is structured for a model, and the crucial distinction between training performance and real-world performance.

2
Section 2.1 — Supervised Learning

What is Supervised Learning?

Supervised learning is training a model using labelled examples — data where the correct answer is already known — so the model can predict answers for new, unseen cases.
Step 1
Historical Data
with known outcomes
Step 2
Train the Model
model learns patterns
Step 3
Make Predictions
for new, unseen data

Classification

Predicts a category. The answer falls into one of a fixed set of groups.

Will this customer churn? Yes or No.
Is this transaction fraudulent? Fraud or Legitimate.

Regression

Predicts a number. The answer can be any value along a continuous scale.

How many subscribers next month? 12,400.
What will revenue be next quarter? $3.2M.

Section 2.2 — Supervised Learning

Features and Labels — MobTel Example

Features (X) — The Inputs

Everything we know about a customer. These are the measurements used to make a prediction — things the business already has on record.

Label / Target (Y) — The Output

The outcome the model is trained to predict. During training we know the label. At prediction time we do not — that is what we want to find out.

Tenure (months) Monthly Charge Num Complaints Data Usage (GB) Support Calls Churned?
24$65028.41 No
8$9236.25 Yes
36$48135.10 No
4$8844.87 Yes
Features — what MobTel already knows about each customer
Label — whether the customer churned (historical outcome)
Section 2.3 — Supervised Learning

Classification vs Regression

Before selecting a model, you must identify what kind of answer you need. The type of output determines which category of supervised learning to apply.
Classification
Output is a category
Question format: Which group does this belong to?
Output examples: Yes/No, High/Medium/Low, Fraud/Legitimate
MobTel use case: Will this customer churn in the next 30 days? → Yes or No
Common metric: Accuracy, Precision, Recall
Regression
Output is a number
Question format: How much or how many?
Output examples: $142,000, 12,400 subscribers, 3.7 rating
MobTel use case: How many new subscribers will we gain next month? → 12,400
Common metric: MAE, RMSE, R²
Rule of thumb: If you can circle the answer on a multiple-choice list, it is classification. If you need a calculator to write down the answer, it is regression.
Section 2.4 — Supervised Learning

Training Data and Testing Data

Training Data — 80%
Test — 20%

Training Data (80%)

The data used to teach the model. The model sees the features and labels together, and adjusts its internal parameters until its predictions match the known answers.

Testing Data (20%)

Data the model has never seen. We use it to check whether the model can generalise — whether it learned real patterns or just memorised the training examples.

Why split the data? A model that scores 98% on its training data but 62% on test data has not actually learned — it has memorised. Testing on held-out data gives us an honest estimate of real-world performance.
Section 2.5 — Supervised Learning

Overfitting and Underfitting

Underfitting
Model is too simple
Time / Feature Value
The model is too simple to capture the real pattern. It misses most of the signal. Training accuracy: 52%. Test accuracy: 50%.
Good Fit
Model generalises well
The model captures the underlying trend without memorising noise. Training accuracy: 85%. Test accuracy: 82%.
Overfitting
Model memorised training data
The model memorised every training point — including noise. It fails on new data. Training accuracy: 99%. Test accuracy: 61%.
Knowledge Checkpoint — Section 2

Supervised Learning

A data analyst at MobTel trains a churn prediction model using two years of customer records. On the training data, the model achieves 97% accuracy. When the model is evaluated on held-out test data, accuracy drops to 63%. What is the most likely explanation, and what should the analyst do?
Section 3

The Machine Learning Workflow

Building a machine learning model is not a single step. We examine the end-to-end process — from raw data to a trusted, evaluated model — and the techniques that make each stage reliable.

3
Section 3.1 — ML Workflow

The 8-Step Machine Learning Workflow

1
Define the Problem

What question are we answering? What does success look like?

2
Collect Data

Gather historical records with the labels and features we need.

3
Prepare Data

Clean, encode, scale, and split data into training and test sets.

4
Choose Algorithm

Select a model appropriate for the task — classification, regression, or forecasting.

5
Train the Model

Fit the model on training data. It adjusts its internal parameters to minimise prediction errors.

6
Evaluate

Measure performance on test data. Does the model generalise to unseen cases?

7
Tune and Improve

Adjust model settings (hyperparameters) and re-evaluate until performance is acceptable.

8
Deploy

Integrate the model into a business system so it can predict on new, live data.

Section 3.2 — ML Workflow

Preparing Your Data

"Garbage in, garbage out." The quality of a machine learning model is determined by the quality of its data. Data preparation is typically the most time-consuming step in any real ML project.

1. Handle Missing Values

Replace missing data with the column mean or median, or remove rows with too many gaps. Models cannot process blank cells.

MobTel: 4.2% of records had missing data_usage values — filled with median (18.7 GB).

2. Encode Categories

Convert text categories to numbers. Most models only accept numeric inputs.

MobTel: contract_type (Monthly / Annual / Two-year) → encoded as 0, 1, 2.

3. Scale Features

Normalise feature values to a similar range so large numbers do not dominate small ones. Tenure (months) vs Monthly Charge ($) need to be comparable.

Note: LightGBM handles scale internally. Scaling is critical for distance-based models.

4. Split Before Processing

Always split into train and test before computing means, medians, or encodings. Using test data statistics during preparation leaks information into your evaluation.

This is one of the most common mistakes in student projects.
Section 3.3 — ML Workflow

Cross-Validation — A More Reliable Evaluation

A single train/test split tests the model once, on one particular subset of data. If that subset happens to contain easy cases, the result looks artificially strong. 5-fold cross-validation repeats this process five times, averaging across all results for a stable, unbiased estimate.
5-Fold Cross-Validation — Each fold takes a turn as the test set
Fold 1
TEST
TRAIN
Fold 2
TEST
Fold 3
TEST
Fold 4
TEST
Fold 5
TRAIN
TEST
Final score = average of Fold 1 score + Fold 2 score + Fold 3 score + Fold 4 score + Fold 5 score
This average is much more reliable than any single split, because no single lucky or unlucky test partition determines the outcome.
Section 3.4 — ML Workflow

Evaluating Model Performance

"Accuracy" alone is misleading when data is imbalanced. If 95% of MobTel customers do not churn, a model that predicts "No" for everyone achieves 95% accuracy — but is completely useless.
Classification Metrics

Accuracy

Proportion of all predictions that are correct. Misleading when classes are imbalanced.

Correct predictions ÷ Total predictions

Precision

Of all customers predicted to churn, how many actually churned? Measures false alarm rate.

True Positives ÷ (True Positives + False Positives)

Recall (Sensitivity)

Of all customers who actually churned, how many did the model catch? Measures missed cases.

True Positives ÷ (True Positives + False Negatives)
Regression Metrics

MAE — Mean Absolute Error

Average size of prediction errors in the original units. Easy to interpret.

Average of |Predicted − Actual|

RMSE — Root Mean Squared Error

Penalises large errors more heavily. Better when big mistakes are especially costly.

√(Average of (Predicted − Actual)²)

R² — Coefficient of Determination

Proportion of variance in the outcome that the model explains. R² = 1 is perfect; R² = 0 is no better than guessing the mean.

Knowledge Checkpoint — Section 3

The ML Workflow

A MobTel analyst builds a churn prediction model and reports: "The model achieves 91% accuracy." A senior analyst is sceptical and asks for cross-validation results. The 5-fold cross-validation scores are 73%, 68%, 91%, 72%, and 70%, giving an average of 74.8%. What is the most important takeaway from this comparison?
Section 4

NeuralProphet — Time Series Forecasting

Used when data has a time dimension — monthly sales, weekly subscribers, daily call volumes. NeuralProphet captures trend and seasonality automatically, making it the go-to tool for business forecasting at MobTel.

4
Section 4.1 — NeuralProphet

What is a Time Series?

Jan Mar May Jul Sep Nov Jan Low High Monthly New Subscribers — MobTel 2023 Each point is one measurement at a specific point in time
A time series is a sequence of measurements taken at regular intervals over time. The order of the data matters — removing the time dimension destroys the signal.
Business examples
MobTel: Monthly new mobile subscriber sign-ups, Jan 2020 — Dec 2023
Retail: Weekly revenue at each store
Finance: Daily closing stock price
Operations: Hourly call centre volume
Time series data has two characteristics that standard ML models cannot handle: trend (values moving up or down over time) and seasonality (patterns that repeat on a regular schedule).
Section 4.2 — NeuralProphet

How NeuralProphet Works

NeuralProphet is a Python library developed by Meta AI. It decomposes a time series into interpretable components, then combines them to produce a forecast. Unlike standard ML models, it is specifically designed for data ordered over time.
Input
Historical
Time Series
Learns
Trend
Component
+
Learns
Seasonality
Component
+
Optional
Holiday
Effects
Output
Forecast
with intervals

Why not standard ML?

Standard models like LightGBM treat each row independently. They cannot naturally capture the fact that February follows January, or that subscribers peak every August.

Forecast horizon

NeuralProphet can forecast as far into the future as you specify — 3 months, 12 months, or beyond — though accuracy generally decreases for longer horizons.

Interpretable by design

Unlike a black-box neural network, NeuralProphet separates the forecast into readable components that a business stakeholder can understand and challenge.

Section 4.3 — NeuralProphet

Trend and Seasonality — Decomposition

Trend — the overall direction over time
Subscribers growing steadily month-on-month
Seasonality — repeating pattern within each year
Peaks in Jan/Feb and Aug/Sep every year
Trend + Seasonality = Forecast
Oscillating around a rising trend — this is the forecast

Trend

The long-run direction — whether the metric is growing or declining over months and years. NeuralProphet fits a piecewise linear function to capture this.

MobTel: subscriber base is growing by roughly 800 customers/month on average.

Seasonality

Regular patterns within a year, month, or week. NeuralProphet detects these automatically using Fourier terms — mathematical waves that fit repeating cycles.

MobTel: spikes in Jan (New Year promotions), Aug/Sep (back to school/university), Dec (holiday deals).

Why This Matters for Business

Separating trend from seasonality allows executives to answer: "Is our subscriber growth real, or just a seasonal blip?" and "Should we hire extra staff in August?"

Section 4.4 — NeuralProphet

MobTel Example — The Data

Business Problem: MobTel's Head of Network Planning needs a 3-month forecast of new mobile subscriber sign-ups to allocate network capacity and staffing for Q1 2024. The data science team has 24 months of monthly subscriber records.
Monthly New Subscriber Sign-ups — 2023
Month New Subscribers vs Prior Month
January 202312,400+18.3%
February 202311,800−4.8%
March 202310,200−13.6%
April 20239,500−6.9%
May 20239,100−4.2%
June 20238,800−3.3%
July 20239,200+4.5%
August 202311,600+26.1%
September 202313,100+12.9%
October 202310,400−20.6%
November 202311,200+7.7%
December 202314,800+32.1%
What the data tells us
Peak 1: Jan–Feb
New Year promotions and customers switching after the holiday period drive a sharp early-year spike.
Trough: Jun–Jul
Mid-year is consistently the slowest period. Fewer new mobile contracts during winter months.
Peak 2: Aug–Sep
Back-to-school and university semester starts drive demand for new mobile plans.
Peak 3: Dec
Holiday gift purchases and end-of-year promotions create the strongest single month of the year.
Section 4.5 — NeuralProphet

MobTel Example — The Forecast Output

15k 12k 9k 6k Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan 24 Feb 24 Mar 24 Actual (2023) Forecast (Q1 2024) 13,200 12,500 10,800 Actual 2023 Forecast Q1 2024
January 2024 Forecast
13,200
Range: 12,400 — 14,000
February 2024 Forecast
12,500
Range: 11,600 — 13,400
March 2024 Forecast
10,800
Range: 9,900 — 11,700
Section 4.6 — NeuralProphet

Reading and Using the Forecast

What the forecast confirms

January is forecast to be the strongest month in Q1 (13,200), consistent with the New Year promotion spike observed in prior years. The model is extrapolating a pattern it found in 24 months of historical data.

What the confidence interval means

The range (e.g., 12,400 — 14,000 for January) reflects uncertainty in the forecast. Plan for the lower end to avoid overcommitting resources. Plan for the upper end for worst-case capacity.

Business decisions this enables

  • Pre-provision 15% additional network capacity for January
  • Schedule 22 additional customer onboarding staff for Jan–Feb
  • Order SIM card inventory before Dec 31 to avoid stock-outs
  • Set Q1 sales targets based on forecast, not last year's actuals

Important limitations

  • The model cannot anticipate a competitor launching a cheaper plan in January 2024
  • It assumes the 2023 pattern will repeat — a network outage could invalidate this
  • Forecasts degrade in accuracy beyond 3–6 months for volatile business data

Key Takeaway

A forecast is not a guarantee — it is a structured, data-driven estimate that replaces guessing. The model's value is that it brings consistency and evidence to a decision that would otherwise rely on intuition alone.

Knowledge Checkpoint — Section 4

NeuralProphet

MobTel's Head of Network Planning is told by a junior analyst: "Our NeuralProphet model predicts exactly 13,200 new subscribers in January 2024, so we should provision capacity for exactly that many." What is the most important problem with this statement?
Section 5

LightGBM — Gradient Boosted Trees

The industry standard for structured data classification and regression. We trace LightGBM from its decision tree foundations through to a step-by-step churn prediction example using MobTel customer data.

5
Section 5.1 — LightGBM

Decision Trees — The Foundation

MobTel Churn Decision Tree
Tenure < 12 months? How long is the customer? YES NO Support calls > 3? In past 6 months Complaints > 2? In past 12 months YES HIGH RISK Churn = Yes NO MED RISK Monitor YES MED RISK Monitor NO LOW RISK Churn = No A single decision tree asks a sequence of yes/no questions to reach a prediction
A decision tree makes predictions by asking a series of yes/no questions about the features, splitting the data at each step until it reaches a final prediction. Every path from root to leaf represents a rule the model has learned from data.

Strengths

Highly interpretable — you can trace exactly why a customer was flagged as high risk. No feature scaling required.

Weakness

A single decision tree overfits easily — it memorises the training data rather than learning a general pattern. LightGBM solves this problem by combining many trees.

Section 5.2 — LightGBM

Gradient Boosting — How LightGBM Learns

Tree 1 (Weak Learner) Makes initial churn predictions Calculates prediction errors Tree 2 Corrects Tree 1 mistakes Smaller errors remain Trees 3, 4 ... N Each corrects the last Final Prediction = Sum of all trees (Churn probability: 0.78 → Yes)
Gradient boosting is an ensemble method — it builds hundreds of small, simple decision trees in sequence, where each new tree is specifically designed to fix the mistakes made by all previous trees combined.

What makes LightGBM fast?

LightGBM uses leaf-wise tree growth instead of level-wise growth. It always expands the leaf that reduces error the most, making it significantly faster than older methods like XGBoost on large datasets.

Why "Light"?

Developed by Microsoft Research in 2017. It uses two techniques — Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) — to handle large datasets without using much memory.

MobTel context

MobTel has 800,000 active customer records. LightGBM can train on this data in minutes on a standard laptop, making it practical for a business analytics team without specialised GPU hardware.

Section 5.3 — LightGBM

What Makes LightGBM Effective?

LightGBM has become the default starting point for most structured data problems in industry. It consistently outperforms simpler models and frequently wins data science competitions involving tabular (table-format) data.

Handles Mixed Data Types

Works directly with numerical and categorical data without requiring extensive preprocessing. MobTel can pass in contract type ("Monthly", "Annual") without manually converting to numbers.

Robust to Missing Values

Unlike many ML algorithms, LightGBM does not crash or require special handling when some feature values are missing. It learns the optimal direction for missing values from the training data.

Feature Importance

After training, LightGBM automatically reports which features were most important in making predictions. This tells MobTel which customer behaviours are the strongest predictors of churn.

Regularisation Built In

LightGBM has built-in mechanisms (L1, L2 regularisation and min_data_in_leaf) that penalise overfitting, making it far more reliable on test data than a single decision tree.

Section 5.4 — LightGBM

MobTel Churn Example — Setting Up the Problem

Business Problem: MobTel's retention team wants to identify which customers are likely to cancel their service in the next 30 days. With 800,000 customers, it is impossible to review every account manually. A LightGBM classifier will assign each customer a churn probability — the team focuses on those above 0.60.
Training Data — 800,000 customer records from the past 24 months
Customer ID Tenure (mo) Monthly Charge Num Complaints Data Usage (GB) Support Calls Churned?
C001 24$65028.41 No
C002 8$9236.25 Yes
C003 36$48135.10 No
C004 4$8844.87 Yes
... 799,996 more rows
Class balance

7.8% of customers churned (62,400 of 800,000). This is imbalanced — accuracy alone is misleading. We report Precision and Recall.

Decision threshold

LightGBM outputs a probability (0–1). Customers above 0.60 are flagged for retention outreach. This threshold is set by the business, not the model.

Section 5.5 — LightGBM

MobTel Churn Example — Step-by-Step Prediction

New Customer C005
Profile to predict
Tenure6 months
Monthly charge$95
Complaints2
Data usage8.1 GB
Support calls4
Final Score
0.78
Threshold: 0.60
CHURN RISK = YES
Flag for retention outreach
How the ensemble reaches this score — simplified to 3 trees
Tree 1 — Initial Assessment

Question: Is tenure less than 12 months? → YES (6 months). Short tenure correlates strongly with churn. Tree 1 assigns an initial churn score of +0.50.

Tree 2 — Corrects Tree 1 Errors

Tree 2 focuses on customers Tree 1 mispredicted. For C005: Support calls > 3? → YES (4 calls). Monthly charge > $80? → YES ($95). Adds +0.22 to the score.

Tree 3 — Further Refinement

Tree 3 checks data usage. Data usage < 10 GB? → YES (8.1 GB). Low data usage + high support calls is a strong churn signal. Adds +0.06.

Final Score (after all N trees, simplified here)

0.50 + 0.22 + 0.06 = 0.78. Since 0.78 > threshold of 0.60, C005 is classified as Churn = Yes. A retention offer is triggered automatically.

Section 5.6 — LightGBM

Feature Importance — What Drives Churn at MobTel?

LightGBM Feature Importance Score — MobTel Churn Model
Num Complaints 32% Tenure (months) 28% Support Calls 22% Monthly Charge 12% Data Usage (GB) 6% Relative importance in churn prediction model
Feature importance tells us which inputs the model used most heavily. This converts the model from a black box into an actionable business insight.

1. Num Complaints (32%)

Implication: Complaint resolution is the top priority for retention. Every unresolved complaint raises churn probability significantly.

2. Tenure (28%)

Implication: New customers are the most at risk. A targeted onboarding experience in the first 12 months could significantly reduce early churn.

3. Support Calls (22%)

Implication: Repeated support calls signal dissatisfaction. Customers making 4+ calls per 6 months should be proactively contacted by a relationship manager.

Note: Data usage (6%) matters less than expected. MobTel's customers who use little data may still be satisfied — it is the service experience (complaints, calls) that determines whether they stay.
Section 5.7 — LightGBM

Choosing Between LightGBM and NeuralProphet

Both algorithms are powerful tools for business prediction — but they are designed for fundamentally different types of problems. Choosing the wrong tool leads to poor results regardless of how well it is implemented.
Criterion NeuralProphet LightGBM
Data type Time series (ordered by date) Structured tabular data (rows are independent)
Output type Future values on a time axis Category (churn/no churn) or number per row
Key strength Captures trend and seasonality automatically Handles many features, imbalanced classes, missing data
MobTel example Forecast new subscriber sign-ups for Q1 2024 Predict which of today's customers will churn next month
Signal to look for The question mentions "over time", "by month", "next quarter", or repeating patterns The question is about individual customers, transactions, or records — not a sequence over time
Knowledge Checkpoint — Section 5

LightGBM

After training a LightGBM churn model on MobTel's data, the team examines feature importance scores. They find that num_complaints (32%) and tenure_months (28%) are the two most influential features, while data_usage_gb (6%) contributes the least. A product manager concludes: "We should remove data usage from the model since it barely matters." What is the most accurate response?
Summary

Week 2 Summary and Next Week

Concepts Covered

  • AI is the broad goal; ML is a data-driven method to achieve it
  • Predictive analytics uses past data to estimate future outcomes
  • Supervised learning trains on labelled historical data
  • Overfitting: high training accuracy, poor test accuracy
  • Cross-validation gives a more reliable performance estimate than a single split
  • Precision and Recall are more informative than Accuracy for imbalanced data

Algorithms Applied

  • NeuralProphet: time series forecasting with trend and seasonality components
  • LightGBM: gradient boosted trees for classification and regression on tabular data
  • MobTel subscriber forecast for Q1 2024: 13,200 → 12,500 → 10,800
  • MobTel churn model: customer C005 scored 0.78 — flagged for retention
  • Feature importance: complaints and tenure drive churn most strongly

Next Week — Week 3: Deep Learning and Neural Networks

We move into neural networks — how they are structured, how they learn through backpropagation, and how they solve complex problems (image recognition, text analysis) that traditional ML cannot handle. The principles of training, overfitting, and evaluation you learned this week apply directly to deep learning.

KBS Logo