DATA4400

Data-Driven Forecasting

Week 4: Introductory Forecasting Methods

Smoothing Techniques and Exponential Methods

Learning Outcomes

By the end of this session, you will be able to:

Analyse different smoothing techniques and their applications in business forecasting
Apply exponential smoothing methods to time series data
Understand time series decomposition for trend and seasonal analysis
Evaluate forecasting models using error metrics and information criteria
Select appropriate forecasting methods based on data characteristics

The Challenge: Noisy Data

The Problem

Real-world data contains random fluctuations (noise)
Noise obscures meaningful patterns and trends
Raw data makes forecasting unreliable

Smoothing filters out short-term noise to reveal long-term signals

Smoothing Techniques: Overview

Technique	Core Concept	Best Used For	Key Feature
Moving Average	Equal weights to past values	Stable data, no trend	Simple, easy to interpret
Single Exponential	Decreasing weights for older data	Stable data, emphasize recent	More weight to recent observations
Double Exponential	Adds trend component	Data with trend, no seasonality	Captures direction of change
Triple Exponential	Adds trend + seasonal components	Data with trend and seasonality	Handles complex patterns

Recall: Moving Averages

How It Works

MA = (Y_t + Y_t-1 + ... + Y_t-k+1) / k

k = number of periods to average
Each observation has equal weight
Larger k = smoother result
Creates forecast centered at time (t+t-k+1)/2

3-point vs 5-point moving averages

Knowledge Check 1

A retail store tracks daily sales with random spikes from promotions. The data shows no trend or seasonal pattern. Which method is most appropriate?

A) Double Exponential Smoothing (Holt's method)

B) Simple Moving Average or Single Exponential Smoothing

C) Triple Exponential Smoothing (Holt-Winters)

D) No smoothing needed

Single Exponential Smoothing

The Core Idea

Recent observations matter more than older ones

Key Characteristics

Assigns exponentially decreasing weights to past observations
Newest data gets highest weight
Oldest data gets lowest weight
Controlled by parameter α (alpha)
Best for data with no clear trend or seasonality

Business Example

ATM Cash Withdrawals

A bank monitors daily cash withdrawals at ATMs. Demand fluctuates but remains stable overall, with occasional spikes on weekends.

Solution: Single Exponential Smoothing emphasizes recent withdrawal patterns while smoothing out random daily variations.

Single Exponential Smoothing: The Formula

Ŷ_t+1 = α × Y_t + (1 - α) × Ŷ_t

Components Explained

Ŷ_t+1 = Forecast for next period
Y_t = Actual value in current period
Ŷ_t = Forecast for current period
α (alpha) = Smoothing constant (0 ≤ α ≤ 1)

Understanding Alpha (α)

α close to 1 (e.g., 0.9):
Heavy weight on recent data → Responsive to changes

α close to 0 (e.g., 0.1):
Heavy weight on historical forecasts → Smooth and stable

Worked Example: Single Exponential Smoothing

Scenario: Forecasting Monthly Demand (α = 0.9)

Month	Actual Demand (Y_t)	Forecast (Ŷ_t)	Calculation
1	13	-	No prior forecast available
2	17	13	Naive forecast (use Y₁)
3	19	16.6	0.9×17 + 0.1×13 = 15.3 + 1.3 = 16.6
4	23	18.76	0.9×19 + 0.1×16.6 = 17.1 + 1.66 = 18.76
5	24	22.58	0.9×23 + 0.1×18.76 = 20.7 + 1.88 = 22.58
6	?	23.86	0.9×24 + 0.1×22.58 = 21.6 + 2.26 = 23.86

Impact of Alpha (α) on Forecasts

Key Insight: Higher α values make the forecast more responsive to recent changes but may amplify noise. Lower α values produce smoother forecasts but may lag behind actual changes.

Knowledge Check 2

A forecaster uses α = 0.2 in Single Exponential Smoothing. What does this indicate about their forecasting approach?

A) They want forecasts to respond quickly to recent changes

B) They prefer smooth, stable forecasts that emphasize historical patterns

C) The data has strong seasonality

D) They are using double exponential smoothing

Double Exponential Smoothing (Holt's Method)

Problem: Single Exponential Smoothing does not perform well when data has a trend

The Solution

Add a trend component to the model
Uses two smoothing parameters:
- α (alpha) for the level
- β (beta) for the trend
Captures both current value and direction of change
Best for trending but non-seasonal data

Business Example

E-Commerce Sales Growth

An online retailer experiences steady monthly sales growth due to increasing market penetration, but no seasonal patterns.

Solution: Holt's method captures both the current sales level and the growth trend.

Double Exponential Smoothing: Formulas

1 Level Equation:

C_t = α × Y_t + (1 - α) × (C_t-1 + T_t-1)

Smooths the current process level at time t

2 Trend Equation:

T_t = β × (C_t - C_t-1) + (1 - β) × T_t-1

Smooths the trend value at time t

3 Forecast Equation:

Ŷ_t+1 = C_t + T_t

Combines level and trend to produce forecast

Understanding the Smoothing Parameters

Alpha (α) - Level Smoothing

Controls: Influence of recent data on the forecasted value

High α (e.g., 0.8):
Forecast reacts quickly to changes in data level

Low α (e.g., 0.2):
Forecast changes gradually, more stable

Beta (β) - Trend Smoothing

Controls: Influence of recent data on the trend

High β (e.g., 0.8):
Trend responds quickly to changes in direction

Low β (e.g., 0.2):
Trend is smooth and stable

Best Practice: Optimal α and β values are typically found by minimizing error metrics (RMSE) on historical data

Knowledge Check 3

Your company's quarterly revenue has been growing steadily by approximately 5% each quarter. There are no seasonal effects. Which forecasting method should you use?

A) Simple Moving Average

B) Single Exponential Smoothing

C) Double Exponential Smoothing (Holt's method)

D) Triple Exponential Smoothing (Holt-Winters)

Triple Exponential Smoothing (Holt-Winters)

When to Use: Data exhibits both trend and seasonality

Key Features

Extends Holt's method with seasonal component
Three smoothing parameters:
- α for level
- β for trend
- γ (gamma) for seasonality
Handles complex, realistic patterns
Two variations:
- Additive: Seasonal fluctuations are constant
- Multiplicative: Seasonal fluctuations grow with level

Business Example

Airline Passenger Demand

Airline passenger numbers show:

Long-term growth trend
Seasonal peaks (summer holidays)
Seasonal dips (off-peak periods)

Solution: Holt-Winters captures level, trend, and seasonal patterns simultaneously.

Identifying Seasonality in Data

Seasonality: Regular, repeating patterns at fixed intervals (monthly, quarterly, yearly)

Examples: Retail sales (holidays), electricity demand (summer/winter), tourism (peak seasons)

Choosing the Right Smoothing Method

DECISION FRAMEWORK Does your data have a TREND? │ ├─ NO → Does it have SEASONALITY? │ │ │ ├─ NO → Single Exponential Smoothing ✓ │ │ (or Simple Moving Average) │ │ │ └─ YES → Seasonal Naive or Decomposition │ └─ YES → Does it have SEASONALITY? │ ├─ NO → Double Exponential Smoothing ✓ │ (Holt's Method) │ └─ YES → Triple Exponential Smoothing ✓ (Holt-Winters Method)

Remember: Always visualize your data first. Plot the time series to identify trends and seasonal patterns before selecting a method.

Business Applications of Smoothing Methods

Industry	Forecasting Need	Data Pattern	Recommended Method
Banking	Daily ATM cash withdrawals	Stable, random fluctuations	Single Exponential Smoothing
E-Commerce	Monthly online sales	Upward trend, no seasonality	Double Exponential (Holt's)
Airlines	Passenger demand	Trend + seasonal peaks	Triple Exponential (Holt-Winters)
Retail	Product inventory	Stable demand	Moving Average or Single ES
Manufacturing	Production planning	Trend + seasonal orders	Triple Exponential (Holt-Winters)

Knowledge Check 4

You are analyzing monthly electricity demand data. You observe that demand increases steadily each year (trend) and has clear summer and winter peaks (seasonality). Additionally, the seasonal peaks are getting larger as overall demand grows. Which model and type should you use?

A) Holt-Winters with Additive Seasonality

B) Holt-Winters with Multiplicative Seasonality

C) Double Exponential Smoothing

D) Single Exponential Smoothing

Time Series Decomposition

Breaking Down Time Series Components

Time series can be decomposed into distinct components to better understand underlying patterns

Additive Decomposition

Y_t = S_t + T_t + R_t

Y_t = Observed value
S_t = Seasonal component
T_t = Trend-cycle component
R_t = Remainder (noise)

Use when: Seasonal fluctuations are roughly constant over time

Multiplicative Decomposition

Y_t = S_t × T_t × R_t

Components interact multiplicatively
Seasonal effect varies with level
More common in business data
Can transform to additive using logarithms

Use when: Seasonal fluctuations grow with the trend

Visualizing Decomposition

Benefit: Decomposition helps identify which components drive your data, informing method selection and improving forecast accuracy

Evaluating Forecast Accuracy

Question: How do we know if our forecast is good?

Error Metrics

We measure the difference between actual values and forecasted values using error metrics. These metrics quantify forecast performance.

RMSE

Root Mean Square Error

Penalizes large errors heavily

MAE

Mean Absolute Error

Average size of errors

MAPE

Mean Absolute Percentage Error

Percentage-based accuracy

Golden Rule: Lower error values = Better forecast performance

Root Mean Square Error (RMSE)

RMSE = √[(1/n) × Σ(Y_i - Ŷ_i)²]

What It Measures

Average magnitude of forecast errors
Same units as the original data
Squares errors before averaging (penalizes large errors)
More sensitive to outliers than MAE

When to Use

When large errors are particularly costly
Comparing models on same dataset
Most commonly reported metric

Worked Example

Forecast Errors: -10, +5, -3, +8

1 Square each error:
100, 25, 9, 64

2 Average:
(100 + 25 + 9 + 64) / 4 = 49.5

3 Take square root:
√49.5 = 7.04

Mean Absolute Percentage Error (MAPE)

MAPE = (1/n) × Σ|((Y_i - Ŷ_i) / Y_i)| × 100%

What It Measures

Average percentage error of forecasts
Scale-independent (allows comparison across datasets)
Easy to interpret (e.g., "5% error")
Avoids positive/negative cancellation

Advantages

Intuitive interpretation as percentage
Can compare accuracy across different products/regions
Commonly used in business contexts

Limitations

Cannot be used when actual values are zero
Asymmetric (penalizes over-forecasts more than under-forecasts)

Interpretation Guide

MAPE < 10%
Excellent forecast accuracy

MAPE 10-20%
Good forecast accuracy

MAPE 20-50%
Reasonable forecast accuracy

MAPE > 50%
Poor forecast accuracy

Mean Squared Error (MSE)

MSE = (1/n) × Σ(Y_i - Ŷ_i)²

What It Measures

Average of squared errors
RMSE = √MSE
Used in optimization algorithms
Heavily penalizes large errors

Relationship to RMSE:

MSE is in squared units, making interpretation difficult. RMSE converts back to original units by taking the square root.

Comparison of Error Metrics

Metric	Units	Outlier Sensitivity
RMSE	Original units	High
MSE	Squared units	High
MAE	Original units	Low
MAPE	Percentage	Medium

Practical Example: Calculating Errors

Month	Actual (Y)	Forecast MA (Ŷ)	Forecast ES (Ŷ)	Error MA	Error ES
3	19	15	16.6	4	2.4
4	23	18	18.76	5	4.24
5	24	21	22.58	3	1.42

RMSE (MA)

4.08

RMSE (ES)

2.86

Winner

Exponential Smoothing (lower RMSE)

Knowledge Check 5

You are comparing two forecasting models. Model A has RMSE = 15.2 and MAPE = 8.5%. Model B has RMSE = 18.7 and MAPE = 7.2%. Which statement is correct?

A) Model A is clearly better because it has lower RMSE

B) Model B is clearly better because it has lower MAPE

C) Model A has smaller absolute errors, but Model B has better percentage accuracy

D) The models cannot be compared using these metrics

Model Selection: Information Criteria

Beyond Error Metrics: Information criteria help choose between different model types while penalizing complexity

Akaike Information Criterion (AIC)

AIC = -2 × log(Likelihood) + 2k

k = number of parameters
Penalizes model complexity
Use AICc for small samples
Lower AIC = Better model

Bayesian Information Criterion (BIC)

BIC = -2 × log(Likelihood) + k × log(n)

n = sample size
Stronger penalty for complexity than AIC
Favors simpler models
Lower BIC = Better model

Use Case: When comparing Single vs Double vs Triple Exponential Smoothing, use AIC/BIC to balance fit quality against model complexity

Practical Model Comparison

Example: Choosing Between Smoothing Methods

Model	Parameters	RMSE	MAPE	AIC	BIC
Single ES	1 (α)	8.45	6.2%	245.3	248.7
Double ES	2 (α, β)	6.12	4.8%	228.1	233.2
Triple ES	3 (α, β, γ)	6.08	4.7%	230.5	237.3

Analysis: Double ES offers the best balance. Triple ES has marginally better error metrics but higher information criteria due to added complexity.

Decision: Choose Double Exponential Smoothing - simpler model with comparable accuracy

Software Tools for Implementation

Course Tools

Python

• statsmodels library
• ExponentialSmoothing()
• Full control over parameters
• Programmatic forecasting

Tableau

• Built-in forecasting
• Automatic parameter selection
• Visual exploration
• Business-friendly interface

Exploratory.io

• No-code forecasting
• Automatic decomposition
• Model comparison
• Quick prototyping

Today's Activity: You will implement these methods in Python and visualize forecasts in Tableau

Key Takeaways

1 Smoothing removes noise from time series data to reveal underlying patterns

2 Choose methods based on data characteristics:

Stable data → Single Exponential Smoothing
Trending data → Double Exponential Smoothing (Holt's)
Trending + Seasonal → Triple Exponential Smoothing (Holt-Winters)

3 Alpha (α) controls responsiveness: High α = reactive, Low α = stable

4 Decomposition breaks down data into Seasonal, Trend, and Remainder components

5 Evaluate models using error metrics (RMSE, MAPE, MSE) and information criteria (AIC, BIC)

6 Lower error values = Better forecasts. Always compare multiple models.

Connecting to Your Assessment

Assessment 3: Individual Forecasting Project

How This Week's Content Helps:

Technical Skills

Select appropriate forecasting methods based on your data patterns
Implement smoothing techniques in Python
Calculate and interpret error metrics
Visualize forecasts effectively
Justify method selection with data characteristics

Presentation Skills

Explain forecasting methods to business stakeholders
Present accuracy metrics clearly
Justify model choice with evidence
Communicate uncertainty and limitations
Provide actionable recommendations

Pro Tip: For your assessment, start by visualizing your data to identify trends and seasonality. This will guide your method selection and strengthen your justification.

Summary and Next Steps

Today's Journey

What We Covered

Smoothing techniques for noise reduction
Single, Double, and Triple Exponential Smoothing
Time series decomposition
Model evaluation using error metrics
Model selection using information criteria
Practical business applications

Next Week: Prophet

Facebook's Prophet forecasting tool
Handling holidays and special events
Automatic changepoint detection
Business-oriented forecasting at scale
Uncertainty intervals and visualization

Action Items:

Complete Python activities for hands-on practice
Experiment with Tableau forecasting features
Review error calculation methods
Begin thinking about your Assessment 3 dataset