DATA4400

Data-Driven Forecasting

Week 4: Introductory Forecasting Methods
Smoothing Techniques and Exponential Methods

Learning Outcomes

By the end of this session, you will be able to:

The Challenge: Noisy Data

The Problem

  • Real-world data contains random fluctuations (noise)
  • Noise obscures meaningful patterns and trends
  • Raw data makes forecasting unreliable
Smoothing filters out short-term noise to reveal long-term signals

Smoothing Techniques: Overview

Technique Core Concept Best Used For Key Feature
Moving Average Equal weights to past values Stable data, no trend Simple, easy to interpret
Single Exponential Decreasing weights for older data Stable data, emphasize recent More weight to recent observations
Double Exponential Adds trend component Data with trend, no seasonality Captures direction of change
Triple Exponential Adds trend + seasonal components Data with trend and seasonality Handles complex patterns

Recall: Moving Averages

How It Works

MA = (Yt + Yt-1 + ... + Yt-k+1) / k
  • k = number of periods to average
  • Each observation has equal weight
  • Larger k = smoother result
  • Creates forecast centered at time (t+t-k+1)/2

3-point vs 5-point moving averages

Knowledge Check 1

A retail store tracks daily sales with random spikes from promotions. The data shows no trend or seasonal pattern. Which method is most appropriate?
A) Double Exponential Smoothing (Holt's method)
B) Simple Moving Average or Single Exponential Smoothing
C) Triple Exponential Smoothing (Holt-Winters)
D) No smoothing needed

Single Exponential Smoothing

The Core Idea

Recent observations matter more than older ones

Key Characteristics

  • Assigns exponentially decreasing weights to past observations
  • Newest data gets highest weight
  • Oldest data gets lowest weight
  • Controlled by parameter α (alpha)
  • Best for data with no clear trend or seasonality

Business Example

ATM Cash Withdrawals

A bank monitors daily cash withdrawals at ATMs. Demand fluctuates but remains stable overall, with occasional spikes on weekends.

Solution: Single Exponential Smoothing emphasizes recent withdrawal patterns while smoothing out random daily variations.

Single Exponential Smoothing: The Formula

Ŷt+1 = α × Yt + (1 - α) × Ŷt

Components Explained

  • Ŷt+1 = Forecast for next period
  • Yt = Actual value in current period
  • Ŷt = Forecast for current period
  • α (alpha) = Smoothing constant (0 ≤ α ≤ 1)

Understanding Alpha (α)

α close to 1 (e.g., 0.9):
Heavy weight on recent data → Responsive to changes
α close to 0 (e.g., 0.1):
Heavy weight on historical forecasts → Smooth and stable

Worked Example: Single Exponential Smoothing

Scenario: Forecasting Monthly Demand (α = 0.9)

Month Actual Demand (Yt) Forecast (Ŷt) Calculation
1 13 - No prior forecast available
2 17 13 Naive forecast (use Y1)
3 19 16.6 0.9×17 + 0.1×13 = 15.3 + 1.3 = 16.6
4 23 18.76 0.9×19 + 0.1×16.6 = 17.1 + 1.66 = 18.76
5 24 22.58 0.9×23 + 0.1×18.76 = 20.7 + 1.88 = 22.58
6 ? 23.86 0.9×24 + 0.1×22.58 = 21.6 + 2.26 = 23.86

Impact of Alpha (α) on Forecasts

Key Insight: Higher α values make the forecast more responsive to recent changes but may amplify noise. Lower α values produce smoother forecasts but may lag behind actual changes.

Knowledge Check 2

A forecaster uses α = 0.2 in Single Exponential Smoothing. What does this indicate about their forecasting approach?
A) They want forecasts to respond quickly to recent changes
B) They prefer smooth, stable forecasts that emphasize historical patterns
C) The data has strong seasonality
D) They are using double exponential smoothing

Double Exponential Smoothing (Holt's Method)

Problem: Single Exponential Smoothing does not perform well when data has a trend

The Solution

  • Add a trend component to the model
  • Uses two smoothing parameters:
    • α (alpha) for the level
    • β (beta) for the trend
  • Captures both current value and direction of change
  • Best for trending but non-seasonal data

Business Example

E-Commerce Sales Growth

An online retailer experiences steady monthly sales growth due to increasing market penetration, but no seasonal patterns.

Solution: Holt's method captures both the current sales level and the growth trend.

Double Exponential Smoothing: Formulas

1 Level Equation:
Ct = α × Yt + (1 - α) × (Ct-1 + Tt-1)

Smooths the current process level at time t

2 Trend Equation:
Tt = β × (Ct - Ct-1) + (1 - β) × Tt-1

Smooths the trend value at time t

3 Forecast Equation:
Ŷt+1 = Ct + Tt

Combines level and trend to produce forecast

Understanding the Smoothing Parameters

Alpha (α) - Level Smoothing

Controls: Influence of recent data on the forecasted value
High α (e.g., 0.8):
Forecast reacts quickly to changes in data level
Low α (e.g., 0.2):
Forecast changes gradually, more stable

Beta (β) - Trend Smoothing

Controls: Influence of recent data on the trend
High β (e.g., 0.8):
Trend responds quickly to changes in direction
Low β (e.g., 0.2):
Trend is smooth and stable
Best Practice: Optimal α and β values are typically found by minimizing error metrics (RMSE) on historical data

Knowledge Check 3

Your company's quarterly revenue has been growing steadily by approximately 5% each quarter. There are no seasonal effects. Which forecasting method should you use?
A) Simple Moving Average
B) Single Exponential Smoothing
C) Double Exponential Smoothing (Holt's method)
D) Triple Exponential Smoothing (Holt-Winters)

Triple Exponential Smoothing (Holt-Winters)

When to Use: Data exhibits both trend and seasonality

Key Features

  • Extends Holt's method with seasonal component
  • Three smoothing parameters:
    • α for level
    • β for trend
    • γ (gamma) for seasonality
  • Handles complex, realistic patterns
  • Two variations:
    • Additive: Seasonal fluctuations are constant
    • Multiplicative: Seasonal fluctuations grow with level

Business Example

Airline Passenger Demand

Airline passenger numbers show:
  • Long-term growth trend
  • Seasonal peaks (summer holidays)
  • Seasonal dips (off-peak periods)
Solution: Holt-Winters captures level, trend, and seasonal patterns simultaneously.

Identifying Seasonality in Data

Seasonality: Regular, repeating patterns at fixed intervals (monthly, quarterly, yearly)
Examples: Retail sales (holidays), electricity demand (summer/winter), tourism (peak seasons)

Choosing the Right Smoothing Method

DECISION FRAMEWORK Does your data have a TREND? │ ├─ NO → Does it have SEASONALITY? │ │ │ ├─ NO → Single Exponential Smoothing ✓ │ │ (or Simple Moving Average) │ │ │ └─ YES → Seasonal Naive or Decomposition │ └─ YES → Does it have SEASONALITY? │ ├─ NO → Double Exponential Smoothing ✓ │ (Holt's Method) │ └─ YES → Triple Exponential Smoothing ✓ (Holt-Winters Method)
Remember: Always visualize your data first. Plot the time series to identify trends and seasonal patterns before selecting a method.

Business Applications of Smoothing Methods

Industry Forecasting Need Data Pattern Recommended Method
Banking Daily ATM cash withdrawals Stable, random fluctuations Single Exponential Smoothing
E-Commerce Monthly online sales Upward trend, no seasonality Double Exponential (Holt's)
Airlines Passenger demand Trend + seasonal peaks Triple Exponential (Holt-Winters)
Retail Product inventory Stable demand Moving Average or Single ES
Manufacturing Production planning Trend + seasonal orders Triple Exponential (Holt-Winters)

Knowledge Check 4

You are analyzing monthly electricity demand data. You observe that demand increases steadily each year (trend) and has clear summer and winter peaks (seasonality). Additionally, the seasonal peaks are getting larger as overall demand grows. Which model and type should you use?
A) Holt-Winters with Additive Seasonality
B) Holt-Winters with Multiplicative Seasonality
C) Double Exponential Smoothing
D) Single Exponential Smoothing

Time Series Decomposition

Breaking Down Time Series Components

Time series can be decomposed into distinct components to better understand underlying patterns

Additive Decomposition

Yt = St + Tt + Rt
  • Yt = Observed value
  • St = Seasonal component
  • Tt = Trend-cycle component
  • Rt = Remainder (noise)

Use when: Seasonal fluctuations are roughly constant over time

Multiplicative Decomposition

Yt = St × Tt × Rt
  • Components interact multiplicatively
  • Seasonal effect varies with level
  • More common in business data
  • Can transform to additive using logarithms

Use when: Seasonal fluctuations grow with the trend

Visualizing Decomposition

Benefit: Decomposition helps identify which components drive your data, informing method selection and improving forecast accuracy

Evaluating Forecast Accuracy

Question: How do we know if our forecast is good?

Error Metrics

We measure the difference between actual values and forecasted values using error metrics. These metrics quantify forecast performance.

RMSE

Root Mean Square Error

Penalizes large errors heavily

MAE

Mean Absolute Error

Average size of errors

MAPE

Mean Absolute Percentage Error

Percentage-based accuracy

Golden Rule: Lower error values = Better forecast performance

Root Mean Square Error (RMSE)

RMSE = √[(1/n) × Σ(Yi - Ŷi)²]

What It Measures

  • Average magnitude of forecast errors
  • Same units as the original data
  • Squares errors before averaging (penalizes large errors)
  • More sensitive to outliers than MAE

When to Use

  • When large errors are particularly costly
  • Comparing models on same dataset
  • Most commonly reported metric

Worked Example

Forecast Errors: -10, +5, -3, +8

1 Square each error:
100, 25, 9, 64
2 Average:
(100 + 25 + 9 + 64) / 4 = 49.5
3 Take square root:
√49.5 = 7.04

Mean Absolute Percentage Error (MAPE)

MAPE = (1/n) × Σ|((Yi - Ŷi) / Yi)| × 100%

What It Measures

  • Average percentage error of forecasts
  • Scale-independent (allows comparison across datasets)
  • Easy to interpret (e.g., "5% error")
  • Avoids positive/negative cancellation

Advantages

  • Intuitive interpretation as percentage
  • Can compare accuracy across different products/regions
  • Commonly used in business contexts

Limitations

  • Cannot be used when actual values are zero
  • Asymmetric (penalizes over-forecasts more than under-forecasts)

Interpretation Guide

MAPE < 10%
Excellent forecast accuracy
MAPE 10-20%
Good forecast accuracy
MAPE 20-50%
Reasonable forecast accuracy
MAPE > 50%
Poor forecast accuracy

Mean Squared Error (MSE)

MSE = (1/n) × Σ(Yi - Ŷi

What It Measures

  • Average of squared errors
  • RMSE = √MSE
  • Used in optimization algorithms
  • Heavily penalizes large errors
Relationship to RMSE:

MSE is in squared units, making interpretation difficult. RMSE converts back to original units by taking the square root.

Comparison of Error Metrics

Metric Units Outlier Sensitivity
RMSE Original units High
MSE Squared units High
MAE Original units Low
MAPE Percentage Medium

Practical Example: Calculating Errors

Month Actual (Y) Forecast MA (Ŷ) Forecast ES (Ŷ) Error MA Error ES
3 19 15 16.6 4 2.4
4 23 18 18.76 5 4.24
5 24 21 22.58 3 1.42

RMSE (MA)

4.08

RMSE (ES)

2.86

Winner

Exponential Smoothing (lower RMSE)

Knowledge Check 5

You are comparing two forecasting models. Model A has RMSE = 15.2 and MAPE = 8.5%. Model B has RMSE = 18.7 and MAPE = 7.2%. Which statement is correct?
A) Model A is clearly better because it has lower RMSE
B) Model B is clearly better because it has lower MAPE
C) Model A has smaller absolute errors, but Model B has better percentage accuracy
D) The models cannot be compared using these metrics

Model Selection: Information Criteria

Beyond Error Metrics: Information criteria help choose between different model types while penalizing complexity

Akaike Information Criterion (AIC)

AIC = -2 × log(Likelihood) + 2k
  • k = number of parameters
  • Penalizes model complexity
  • Use AICc for small samples
  • Lower AIC = Better model

Bayesian Information Criterion (BIC)

BIC = -2 × log(Likelihood) + k × log(n)
  • n = sample size
  • Stronger penalty for complexity than AIC
  • Favors simpler models
  • Lower BIC = Better model
Use Case: When comparing Single vs Double vs Triple Exponential Smoothing, use AIC/BIC to balance fit quality against model complexity

Practical Model Comparison

Example: Choosing Between Smoothing Methods

Model Parameters RMSE MAPE AIC BIC
Single ES 1 (α) 8.45 6.2% 245.3 248.7
Double ES 2 (α, β) 6.12 4.8% 228.1 233.2
Triple ES 3 (α, β, γ) 6.08 4.7% 230.5 237.3
Analysis: Double ES offers the best balance. Triple ES has marginally better error metrics but higher information criteria due to added complexity.
Decision: Choose Double Exponential Smoothing - simpler model with comparable accuracy

Software Tools for Implementation

Course Tools

Python

• statsmodels library
• ExponentialSmoothing()
• Full control over parameters
• Programmatic forecasting

Tableau

• Built-in forecasting
• Automatic parameter selection
• Visual exploration
• Business-friendly interface

Exploratory.io

• No-code forecasting
• Automatic decomposition
• Model comparison
• Quick prototyping

Today's Activity: You will implement these methods in Python and visualize forecasts in Tableau

Key Takeaways

1 Smoothing removes noise from time series data to reveal underlying patterns
2 Choose methods based on data characteristics:
  • Stable data → Single Exponential Smoothing
  • Trending data → Double Exponential Smoothing (Holt's)
  • Trending + Seasonal → Triple Exponential Smoothing (Holt-Winters)
3 Alpha (α) controls responsiveness: High α = reactive, Low α = stable
4 Decomposition breaks down data into Seasonal, Trend, and Remainder components
5 Evaluate models using error metrics (RMSE, MAPE, MSE) and information criteria (AIC, BIC)
6 Lower error values = Better forecasts. Always compare multiple models.

Connecting to Your Assessment

Assessment 3: Individual Forecasting Project

How This Week's Content Helps:

Technical Skills

  • Select appropriate forecasting methods based on your data patterns
  • Implement smoothing techniques in Python
  • Calculate and interpret error metrics
  • Visualize forecasts effectively
  • Justify method selection with data characteristics

Presentation Skills

  • Explain forecasting methods to business stakeholders
  • Present accuracy metrics clearly
  • Justify model choice with evidence
  • Communicate uncertainty and limitations
  • Provide actionable recommendations
Pro Tip: For your assessment, start by visualizing your data to identify trends and seasonality. This will guide your method selection and strengthen your justification.

Summary and Next Steps

Today's Journey

What We Covered

  • Smoothing techniques for noise reduction
  • Single, Double, and Triple Exponential Smoothing
  • Time series decomposition
  • Model evaluation using error metrics
  • Model selection using information criteria
  • Practical business applications

Next Week: Prophet

  • Facebook's Prophet forecasting tool
  • Handling holidays and special events
  • Automatic changepoint detection
  • Business-oriented forecasting at scale
  • Uncertainty intervals and visualization
Action Items:
  • Complete Python activities for hands-on practice
  • Experiment with Tableau forecasting features
  • Review error calculation methods
  • Begin thinking about your Assessment 3 dataset
1 / 34