Before moving forward, we revisit the core ideas from last week — the distinction between AI and machine learning, the four levels of analytics, and why predictive analytics matters for business today.
The broad goal of making computers behave intelligently — simulating human reasoning, understanding, and decision-making. It covers many different technologies and approaches.
A subset of AI. Instead of writing explicit rules, we give the computer data and let it learn patterns automatically. The model improves its predictions the more data it sees.
| Level | ML Role | Example Technique |
|---|---|---|
| Descriptive | Summarise and segment data | K-Means clustering, PCA |
| Diagnostic | Find patterns that explain outcomes | Decision trees, SHAP analysis |
| Predictive — Today | Forecast future outcomes from past data | LightGBM, NeuralProphet |
| Prescriptive | Recommend the best action to take | Causal ML, Reinforcement Learning |
Customer transactions, web clickstreams, IoT sensors, social media — historical data now exists at scale.
Cloud platforms (AWS, GCP, Azure) allow businesses to run ML models in hours rather than months.
Python libraries like LightGBM and NeuralProphet make production-quality forecasting available to analysts, not just data scientists.
The most widely used form of machine learning in business. We explore how supervised learning works, how data is structured for a model, and the crucial distinction between training performance and real-world performance.
Predicts a category. The answer falls into one of a fixed set of groups.
Will this customer churn? Yes or No.
Is this transaction fraudulent? Fraud or Legitimate.
Predicts a number. The answer can be any value along a continuous scale.
How many subscribers next month? 12,400.
What will revenue be next quarter? $3.2M.
Everything we know about a customer. These are the measurements used to make a prediction — things the business already has on record.
The outcome the model is trained to predict. During training we know the label. At prediction time we do not — that is what we want to find out.
| Tenure (months) | Monthly Charge | Num Complaints | Data Usage (GB) | Support Calls | Churned? |
|---|---|---|---|---|---|
| 24 | $65 | 0 | 28.4 | 1 | No |
| 8 | $92 | 3 | 6.2 | 5 | Yes |
| 36 | $48 | 1 | 35.1 | 0 | No |
| 4 | $88 | 4 | 4.8 | 7 | Yes |
The data used to teach the model. The model sees the features and labels together, and adjusts its internal parameters until its predictions match the known answers.
Data the model has never seen. We use it to check whether the model can generalise — whether it learned real patterns or just memorised the training examples.
Building a machine learning model is not a single step. We examine the end-to-end process — from raw data to a trusted, evaluated model — and the techniques that make each stage reliable.
What question are we answering? What does success look like?
Gather historical records with the labels and features we need.
Clean, encode, scale, and split data into training and test sets.
Select a model appropriate for the task — classification, regression, or forecasting.
Fit the model on training data. It adjusts its internal parameters to minimise prediction errors.
Measure performance on test data. Does the model generalise to unseen cases?
Adjust model settings (hyperparameters) and re-evaluate until performance is acceptable.
Integrate the model into a business system so it can predict on new, live data.
Replace missing data with the column mean or median, or remove rows with too many gaps. Models cannot process blank cells.
Convert text categories to numbers. Most models only accept numeric inputs.
Normalise feature values to a similar range so large numbers do not dominate small ones. Tenure (months) vs Monthly Charge ($) need to be comparable.
Always split into train and test before computing means, medians, or encodings. Using test data statistics during preparation leaks information into your evaluation.
Proportion of all predictions that are correct. Misleading when classes are imbalanced.
Of all customers predicted to churn, how many actually churned? Measures false alarm rate.
Of all customers who actually churned, how many did the model catch? Measures missed cases.
Average size of prediction errors in the original units. Easy to interpret.
Penalises large errors more heavily. Better when big mistakes are especially costly.
Proportion of variance in the outcome that the model explains. R² = 1 is perfect; R² = 0 is no better than guessing the mean.
Used when data has a time dimension — monthly sales, weekly subscribers, daily call volumes. NeuralProphet captures trend and seasonality automatically, making it the go-to tool for business forecasting at MobTel.
Standard models like LightGBM treat each row independently. They cannot naturally capture the fact that February follows January, or that subscribers peak every August.
NeuralProphet can forecast as far into the future as you specify — 3 months, 12 months, or beyond — though accuracy generally decreases for longer horizons.
Unlike a black-box neural network, NeuralProphet separates the forecast into readable components that a business stakeholder can understand and challenge.
The long-run direction — whether the metric is growing or declining over months and years. NeuralProphet fits a piecewise linear function to capture this.
MobTel: subscriber base is growing by roughly 800 customers/month on average.
Regular patterns within a year, month, or week. NeuralProphet detects these automatically using Fourier terms — mathematical waves that fit repeating cycles.
MobTel: spikes in Jan (New Year promotions), Aug/Sep (back to school/university), Dec (holiday deals).
Separating trend from seasonality allows executives to answer: "Is our subscriber growth real, or just a seasonal blip?" and "Should we hire extra staff in August?"
| Month | New Subscribers | vs Prior Month |
|---|---|---|
| January 2023 | 12,400 | +18.3% |
| February 2023 | 11,800 | −4.8% |
| March 2023 | 10,200 | −13.6% |
| April 2023 | 9,500 | −6.9% |
| May 2023 | 9,100 | −4.2% |
| June 2023 | 8,800 | −3.3% |
| July 2023 | 9,200 | +4.5% |
| August 2023 | 11,600 | +26.1% |
| September 2023 | 13,100 | +12.9% |
| October 2023 | 10,400 | −20.6% |
| November 2023 | 11,200 | +7.7% |
| December 2023 | 14,800 | +32.1% |
January is forecast to be the strongest month in Q1 (13,200), consistent with the New Year promotion spike observed in prior years. The model is extrapolating a pattern it found in 24 months of historical data.
The range (e.g., 12,400 — 14,000 for January) reflects uncertainty in the forecast. Plan for the lower end to avoid overcommitting resources. Plan for the upper end for worst-case capacity.
A forecast is not a guarantee — it is a structured, data-driven estimate that replaces guessing. The model's value is that it brings consistency and evidence to a decision that would otherwise rely on intuition alone.
The industry standard for structured data classification and regression. We trace LightGBM from its decision tree foundations through to a step-by-step churn prediction example using MobTel customer data.
Highly interpretable — you can trace exactly why a customer was flagged as high risk. No feature scaling required.
A single decision tree overfits easily — it memorises the training data rather than learning a general pattern. LightGBM solves this problem by combining many trees.
LightGBM uses leaf-wise tree growth instead of level-wise growth. It always expands the leaf that reduces error the most, making it significantly faster than older methods like XGBoost on large datasets.
Developed by Microsoft Research in 2017. It uses two techniques — Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) — to handle large datasets without using much memory.
MobTel has 800,000 active customer records. LightGBM can train on this data in minutes on a standard laptop, making it practical for a business analytics team without specialised GPU hardware.
Works directly with numerical and categorical data without requiring extensive preprocessing. MobTel can pass in contract type ("Monthly", "Annual") without manually converting to numbers.
Unlike many ML algorithms, LightGBM does not crash or require special handling when some feature values are missing. It learns the optimal direction for missing values from the training data.
After training, LightGBM automatically reports which features were most important in making predictions. This tells MobTel which customer behaviours are the strongest predictors of churn.
LightGBM has built-in mechanisms (L1, L2 regularisation and min_data_in_leaf) that penalise overfitting, making it far more reliable on test data than a single decision tree.
| Customer ID | Tenure (mo) | Monthly Charge | Num Complaints | Data Usage (GB) | Support Calls | Churned? |
|---|---|---|---|---|---|---|
| C001 | 24 | $65 | 0 | 28.4 | 1 | No |
| C002 | 8 | $92 | 3 | 6.2 | 5 | Yes |
| C003 | 36 | $48 | 1 | 35.1 | 0 | No |
| C004 | 4 | $88 | 4 | 4.8 | 7 | Yes |
| ... 799,996 more rows | ||||||
7.8% of customers churned (62,400 of 800,000). This is imbalanced — accuracy alone is misleading. We report Precision and Recall.
LightGBM outputs a probability (0–1). Customers above 0.60 are flagged for retention outreach. This threshold is set by the business, not the model.
Question: Is tenure less than 12 months? → YES (6 months). Short tenure correlates strongly with churn. Tree 1 assigns an initial churn score of +0.50.
Tree 2 focuses on customers Tree 1 mispredicted. For C005: Support calls > 3? → YES (4 calls). Monthly charge > $80? → YES ($95). Adds +0.22 to the score.
Tree 3 checks data usage. Data usage < 10 GB? → YES (8.1 GB). Low data usage + high support calls is a strong churn signal. Adds +0.06.
0.50 + 0.22 + 0.06 = 0.78. Since 0.78 > threshold of 0.60, C005 is classified as Churn = Yes. A retention offer is triggered automatically.
Implication: Complaint resolution is the top priority for retention. Every unresolved complaint raises churn probability significantly.
Implication: New customers are the most at risk. A targeted onboarding experience in the first 12 months could significantly reduce early churn.
Implication: Repeated support calls signal dissatisfaction. Customers making 4+ calls per 6 months should be proactively contacted by a relationship manager.
| Criterion | NeuralProphet | LightGBM |
|---|---|---|
| Data type | Time series (ordered by date) | Structured tabular data (rows are independent) |
| Output type | Future values on a time axis | Category (churn/no churn) or number per row |
| Key strength | Captures trend and seasonality automatically | Handles many features, imbalanced classes, missing data |
| MobTel example | Forecast new subscriber sign-ups for Q1 2024 | Predict which of today's customers will churn next month |
| Signal to look for | The question mentions "over time", "by month", "next quarter", or repeating patterns | The question is about individual customers, transactions, or records — not a sequence over time |
We move into neural networks — how they are structured, how they learn through backpropagation, and how they solve complex problems (image recognition, text analysis) that traditional ML cannot handle. The principles of training, overfitting, and evaluation you learned this week apply directly to deep learning.