DATA4400 · Week 3 · Moving Averages, Stationarity & Correlation Analysis

DATA4400 · Lesson 3

Moving Averages,
Stationarity &
Correlation Analysis

Data-Driven Forecasting

Kaplan Business School · Master of Business Analytics

Section 1 Stationarity

Section 2 Discrete White Noise & Random Walk

Section 3 Differencing

Section 4 Moving Averages

Lesson 3

Learning Outcomes

By the end of this lesson you will be able to:

1

Evaluate the concept of stationarity

Identify whether a time series is stationary or non-stationary using visual inspection and statistical tests.

2

Understand and apply differencing

Use first-order and second-order differencing to remove trends and achieve stationarity.

3

Smooth data with moving averages

Calculate and interpret simple, weighted, and centred moving averages to reveal underlying trends.

01

Stationarity

Before we can build a forecasting model, we need to understand whether the data is stable over time. This property is called stationarity — and it is the most important data property in time series analysis.

What it is Why it matters How to identify it Stationary vs Non-stationary

Section 1 · Stationarity

1.1 What is Stationarity?

Simple definition

A time series is stationary if its mean (average level) and variance (spread/volatility) remain constant over time — the series does not drift up, drift down, or become more erratic as time passes.

Stationary = stable behaviour

Constant mean (no trend)
Constant variance (no widening spread)
No seasonal component

Non-stationary = changing behaviour

Mean drifts up or down (trend)
Variance grows over time
Recurring seasonal spikes

💡 Think of it this way: if you picked two random windows of your data and they look similar (same average, same spread), the series is stationary. If one window looks very different from another, it is non-stationary.

Three types of non-stationarity

Trend

Mean increases or decreases steadily. Example: total annual revenue for a growing company.

Step change

Mean jumps suddenly at a point in time. Example: website traffic after a viral marketing campaign.

Variance shift

Volatility increases over time. Example: stock prices becoming more volatile in a market crisis.

Section 1 · Stationarity

1.2 Visualising Stationarity

Compare the two series below. Ask yourself: does the average level and spread stay roughly the same throughout?

Series A — Stationary

The series fluctuates around a constant mean (dashed line). The spread does not change. This is stationary.

Series B — Non-stationary (trend)

The mean keeps rising over time. The series is non-stationary. Most forecasting models will not work reliably on this data without transformation.

Forecasting models like ARIMA assume stationarity. If your data is non-stationary, you must transform it first.

Section 1 · Stationarity

1.3 Why Stationarity Matters for Business Forecasting

Statistical stability

Forecasting models calculate a mean and variance from the training data, then assume those values will hold in the future. If the mean keeps shifting (trend), any forecast built from past averages is immediately wrong.

Reliable prediction intervals

For a stationary series, long-run forecasts converge to the mean. The 68% prediction interval is mean ± 1 standard deviation; the 95% interval is mean ± 2 standard deviations. These are interpretable and useful for business planning.

Modelling the remainder

After decomposing a time series into Trend + Seasonal + Remainder, the Remainder component should behave like a stationary series. Stationarity tests help verify that decomposition has worked correctly.

Business impact example

A retail chain builds a demand forecast on raw monthly sales data (which has a growth trend). Because the model is trained on non-stationary data:

Early-period averages underestimate current demand
The model consistently orders too little stock
Lost sales accumulate — e.g. $200k missed revenue per quarter

Simply stationarising the data before modelling fixes this.

Two conditions for full stationarity

Stationary in mean — average does not change over time
Stationary in variance — spread (standard deviation) does not change over time

Both conditions must hold. A series can be stationary in mean but not variance (e.g. a random walk).

02

Discrete White Noise
& Random Walk

Two fundamental stationary process models that appear frequently in business data. Understanding these builds intuition for what "pure randomness" and "unpredictable drift" look like.

Discrete White Noise Random Walk Random Walk with Drift Naïve Method

Section 2 · Models

2.1 Discrete White Noise (DWN)

What

Pure random fluctuations with no pattern, no trend, no seasonality. Each observation is independent of all others.

Why it matters

DWN is the ideal residual — if your model's errors look like DWN, there is nothing left to predict. The model has captured everything useful.

When you see it

Random operational noise: day-to-day call centre volume variation, minor cash register fluctuations, sensor readings in a stable process.

Y_t = ε_t ε_t ~ iid(0, σ²)

Key properties: mean = 0, variance = σ² (constant), each value is independent. DWN is stationary.

Business use case — performance monitoring

A website's daily traffic fluctuates randomly between 900–1100 visitors. If this matches DWN properties, the company knows: there is no underlying problem, no trend, no seasonal effect. Any day-to-day difference is just noise — do not react to it. If a trend emerges, that is a meaningful signal worth investigating.

DWN simulation — four realisations

Each coloured line is a different DWN series (σ=1). Notice: they all look different but share the same statistical properties.

Section 2 · Models

2.2 Random Walk & Random Walk with Drift

Random Walk (no drift)

Y_t = Y_t−1 + ε_t

Today's value = yesterday's value + a random shock. Stationary in mean but not stationary in variance — the spread keeps growing over time.

Example: a drunk person walking — each step is random, but over time they wander further from the start.

Random Walk with Drift (δ)

Y_t = δ + Y_t−1 + ε_t

A consistent upward (or downward) drift δ is added to each step. Not stationary — both mean and variance change over time.

Positive δ → series trends upward (e.g. economic growth)
Negative δ → series trends downward

Business applications

Stock prices — RW with drift (drift = expected return)
GDP, CPI, exchange rates
Long-run demand forecasting

Random Walk — four realisations

Why this is dangerous for forecasting

Because variance grows without bound, long-run prediction intervals become extremely wide. The Naïve method (predict next = last observation) is actually optimal for a random walk.

Naïve forecast: Ŷ_T+1 = Y_T

The best forecast for a random walk is simply the last observed value.

Knowledge Checkpoint

✓ Checkpoint 1 — Stationarity & White Noise

Question 1 of 2

A company's monthly revenue has grown steadily from $1M to $5M over 5 years. What does this tell you about the series?

AIt is stationary because the variance looks constant.

BIt is non-stationary because the mean is increasing over time.

CIt is stationary because the series has no seasonal component.

DStationarity cannot be determined without running ARIMA first.

Question 2 of 2

Your model's residuals (errors) look exactly like Discrete White Noise. What does this mean?

AThe model is under-fitted and needs more variables.

BThe residuals contain a hidden trend you should remove.

CThe model has captured all predictable patterns — nothing useful remains.

DYou should apply differencing to the residuals before forecasting.

03

Differencing

Differencing is the primary tool for converting a non-stationary series into a stationary one. It is the pre-processing step that unlocks models like ARIMA and SARIMA. Understanding it builds the intuition for the "I" (Integrated) component in ARIMA.

What it is Why we use it First-order differencing Second-order differencing Seasonal differencing ADF & KPSS tests

Section 3 · Differencing

3.1 What is Differencing — and Why?

What

Instead of looking at the raw value at each time point, you compute how much it changed from one period to the next. You subtract yesterday from today.

Why

A trending series has a moving target — the model does not know if a value is "high" or "low" because the whole scale is shifting. Differencing removes that shift and leaves stable fluctuations.

When

Visual plot shows a clear upward/downward trend
ADF test p-value > 0.05 (series is non-stationary)
ACF plot decays very slowly (does not drop to zero quickly)

First-order: ∇Y_t = Y_t − Y_t−1

Four uses of differencing:
① Remove a linear trend ② Remove a stochastic (random) trend
③ Stabilise the mean ④ Prepare data for ARIMA/SARIMA modelling

Step-by-step example

Month	Sales ($)	Difference ∇Y_t
Jan	100	—
Feb	120	120 − 100 = +20
Mar	140	140 − 120 = +20
Apr	160	160 − 140 = +20
May	180	180 − 160 = +20

What happened?

The raw series has a clear upward trend. After differencing, each value is +20 — perfectly stable. The trend has been removed. The differenced series is stationary.

In practice, differences will not be perfectly equal — they will fluctuate around a stable mean, which is what we want.

Seasonal differencing: ∇_pY_t = Y_t − Y_t−p

p = seasonal period (e.g. p=12 for monthly data with annual seasonality)

Section 3 · Differencing

3.2 Before & After Differencing

The chart below shows a non-stationary series (upward trend) and its first-difference. Notice how differencing removes the trend completely.

Original series — non-stationary (trending up)

Mean is not constant — series drifts upward. Cannot be modelled directly.

After first-order differencing — stationary

Fluctuates around a stable mean. Trend has been removed. Ready to model.

First-order differencing (d=1)

Subtracts consecutive observations. Removes linear trends. Sufficient for most business time series.

Second-order differencing (d=2)

Differences the already-differenced series. Use when the series has a quadratic (accelerating) trend and first-order is not enough.

∇²Y_t = Y_t − 2Y_t−1 + Y_t−2

Caution: over-differencing

Applying too many differences can introduce artificial structure. Use the minimum number of differences needed to achieve stationarity. Rarely need d > 2.

Section 3 · Differencing

3.3 When to Difference — Formal Tests

1. Visual inspection (always start here)

Plot your time series. If you see a clear upward or downward trend, or a widening spread, differencing is likely needed. This is the fastest and most intuitive check.

2. Augmented Dickey-Fuller (ADF) Test

The most common formal test for non-stationarity.

H₀: Unit root present → series is non-stationary
H₁: No unit root → series is stationary

p-value < 0.05 → reject H₀ → series is stationary (no need to difference)
p-value ≥ 0.05 → fail to reject H₀ → series is non-stationary (difference it)

3. KPSS Test (reverse hypotheses)

H₀: Series is stationary
H₁: Series has a unit root → is non-stationary

Use ADF and KPSS together to confirm results — they complement each other because their null hypotheses are opposite.

4. Autocorrelation Function (ACF)

If the ACF plot shows autocorrelations that decay slowly (remain high even at large lags), this suggests non-stationarity. After differencing, the ACF should drop to near zero quickly.

Decision process

1

Plot the series — does it trend?

2

Run ADF test — is p-value ≥ 0.05?

3

If yes → apply first-order differencing

4

Re-test on differenced series — repeat if needed

5

Stop when ADF p-value < 0.05 (series is stationary)

Knowledge Checkpoint

✓ Checkpoint 2 — Differencing & Unit Root Tests

Question 3 of 5

You run an ADF test on monthly sales data and get a p-value of 0.42. What should you do next?

AThe series is stationary — proceed to model it directly.

BThe series is non-stationary — apply first-order differencing, then re-test.

CApply second-order differencing immediately, since p > 0.05.

DThe test is inconclusive — use KPSS only from now on.

Question 4 of 5

After applying first-order differencing to a time series, the ADF test p-value drops to 0.01. What does this tell you?

AFirst-order differencing failed — you need to apply second-order differencing.

BThe original series was already stationary before differencing.

CFirst-order differencing worked — the series is now stationary (d=1).

DA p-value of 0.01 means the test was not significant — no conclusion can be drawn.

04

Moving Averages &
Smoothing Techniques

Moving averages filter out short-term noise to reveal the underlying trend. They are one of the most widely used tools in business analytics, appearing in stock trading dashboards, sales performance reports, and operational monitoring.

Simple Moving Average Weighted Moving Average Centred Moving Average Window size trade-off

Section 4 · Moving Averages

4.1 What is Smoothing and Why Do We Need It?

The problem: noisy data

Raw business data is almost always noisy. A retailer's weekly sales jump up and down due to promotions, weather, public holidays, or simple randomness. This noise masks the underlying trend — the long-run direction that actually matters for planning.

What

Replace each data point with the average of nearby points. Short-term spikes cancel out, revealing the smooth trend.

Why

Distinguish genuine trends from random noise. Avoid over-reacting to a single unusual week.

How

Choose a window size k (number of periods to average). Larger k = smoother, but more lag behind recent changes.

Business rule: do not make a strategic decision based on one data point. Use moving averages to confirm the trend before acting.

Noise vs signal — the core challenge

Grey = raw noisy sales data. Red = 4-week moving average. The trend is only clear after smoothing.

Key trade-off: a larger window removes more noise but reacts more slowly to genuine changes in the trend. A smaller window is more responsive but lets more noise through.

Section 4 · Moving Averages

4.2 Simple Moving Average (SMA)

How it works

The SMA at time t uses the k most recent observations, each given equal weight. The "window" slides forward one period at a time.

Ŷ_t+1 = (Y_t + Y_t−1 + Y_t−2 + … + Y_t−k+1) / k

where k = number of periods in the moving average (window size)

Worked example — 3-period SMA

Quarter	Demand ($M)	3-period SMA
Q1	4.71	—
Q2	4.75	—
Q3	4.63	(4.71+4.75+4.63)/3 = 4.70
Q4	4.74	(4.75+4.63+4.74)/3 = 4.71
Q5	4.19	(4.63+4.74+4.19)/3 = 4.52
Q6 (forecast)	—	4.52

When to use SMA: no apparent trend; seasonal data (set k = seasonal period). Use as a benchmark before trying more complex models.

Important notes

The first (k−1) periods have no SMA value — not enough previous observations yet
The SMA is centred at time t − (k−1)/2, not at time t — it lags behind the present
If data has seasonality, set k to the seasonal period (e.g. k=4 for quarterly, k=12 for monthly)
SMA is better for smoothing and exploration than for multi-step forecasting

Weighted Moving Average (WMA)

A variant that gives more weight to recent observations and less weight to older ones. More responsive to recent changes than plain SMA.

Example: for k=3, you might assign weights of 0.5, 0.3, 0.2 (most recent gets 0.5). The weights must sum to 1.

Limitation of SMA for forecasting

SMA lags behind genuine trend changes. If sales are rising, the SMA will consistently underestimate the current level. For long-range forecasting, exponential smoothing (Week 4) handles trend much better.

Section 4 · Moving Averages

4.3 Centred Moving Average (CMA)

The problem with even-period smoothing

For seasonal data with an even period (e.g. k=4 quarters, k=12 months), the plain moving average falls between two time points — not at any actual observation. This creates a misalignment that makes it impossible to estimate seasonal effects accurately.

The solution: centring

Take the average of the k-period MA going back from t and the k-period MA going back from t−1. This centres the average exactly at time t − k/2.

(0.5Y_t + Y_t−1 + … + Y_t−k+1 + 0.5Y_t−k) / k

Note: the first and last observations get half weight (0.5). All others get full weight.

Why CMA matters

CMA is the correct smoothing method used inside classical time series decomposition (Trend + Seasonal + Remainder). It ensures the trend estimate is aligned with the data, allowing seasonal factors to be estimated accurately.

SMA vs WMA vs CMA — quick comparison

Method	Equal weights?	Best for
SMA	Yes	Quick trend smoothing, no seasonality
WMA	No (recent = more)	Series where recent data matters more
CMA	Approx. equal	Seasonal decomposition (even periods)

Limitations of all moving averages

No values at the start and end of the series
Always lags behind the actual current level
Poor for multi-step ahead forecasting — use exponential smoothing or ARIMA instead
Good for data exploration, imputation, and trend extraction

Moving averages are exploratory tools — they help you understand the data before building a formal forecasting model.

Section 4 · Moving Averages

4.4 Window Size — Noise vs Lag Trade-off

Adjust the window size and observe how the smoothed line changes. A larger window removes more noise but creates a longer lag behind the actual data.

Small window (k=3)

Closely follows the raw data. Removes some noise, but still shows many short-term fluctuations. Reacts quickly to genuine changes. Use for short-term operational monitoring.

Medium window (k=7)

Balanced smoothing. Removes most week-to-week noise while still tracking medium-term trends. A good default starting point for most business series.

Large window (k=14)

Very smooth — longer-run trend is very clear. However, significant lag — the smoothed line does not yet reflect very recent changes. Use for strategic trend analysis.

Section 4 · Moving Averages

4.5 The Naïve Forecasting Method

What it is

The simplest possible forecasting method: the forecast for next period equals the most recent observed value. Nothing else is considered.

Ŷ_T+1 = Y_T

T = current period. The forecast for T+1 is simply whatever happened at T.

Why it works (when it does)

When data follows a random walk, the naïve method is mathematically optimal. There is no exploitable pattern — the best guess for tomorrow is what happened today.

When to use it

As a benchmark baseline — any more complex model should outperform naïve, or it is not worth using
When data shows random walk behaviour (e.g. stock prices)
Very short-term operational decisions (next hour, next day)

Naïve as benchmark — business rule

Before presenting a forecasting model to stakeholders, always compare it to the naïve method. If your ARIMA or Prophet model can't beat "just repeat last period's value", question whether the added complexity is justified.

Example: A logistics company forecasts weekly parcel volumes using a naïve model (RMSE = 850). Their new ML model gives RMSE = 420. The ML model is 50% more accurate — worth the investment. If ML RMSE = 840, the complexity is not justified.

Seasonal naïve variant

Instead of using the last period, use the observation from the same period last year (or last season). Useful for highly seasonal data like retail (Christmas week = last Christmas week).

Ŷ_T+h = Y_T+h−m

m = seasonal period. h = forecast horizon.

Knowledge Checkpoint

✓ Checkpoint 3 — Moving Averages

Question 5 of 5

A retail analyst uses a 30-day SMA to monitor daily sales. She notices the smoothed line is below the raw data for the past two weeks. What is the most likely explanation?

AThe SMA window is too small and should be reduced to 7 days.

BSales have been increasing recently, and the SMA lags behind — it still reflects older, lower values.

CThe analyst should switch to a CMA to fix the lag issue.

DThe raw data has an error — negative sales values are pulling the SMA down.

Key insight

This is the fundamental lag problem of trailing moving averages. A large window gives a smoother line but always represents the past average, not the present level. For real-time business decisions, the lag matters.

One practical solution: use a shorter window for operations (react faster) and a longer window for strategy (see the bigger trend). Many analysts plot both simultaneously.

Lesson 3 · Summary

Summary — What We Covered

01 · Stationarity

Constant mean and variance over time
Required by ARIMA, SARIMA, and related models
Three types of non-stationarity: trend, step change, variance shift
Check visually, then formally with ADF/KPSS

02 · DWN & Random Walk

DWN: pure noise, no pattern, ideal model residual
Random Walk: today = yesterday + random shock (non-stationary in variance)
Random Walk with Drift: adds a consistent upward/downward component
Naïve method is optimal for random walk data

03 · Differencing

∇Y_t = Y_t − Y_t−1 removes linear trends
Second-order removes quadratic trends
Seasonal differencing removes seasonal patterns
ADF test: p ≥ 0.05 → difference; p < 0.05 → stationary

04 · Moving Averages

SMA: equal weight to k most recent periods
WMA: more weight to recent observations
CMA: use for even-period seasonal decomposition
Larger window → smoother but more lag
Best for exploration, not long-range forecasting

05 · Key Business Rules

Always check stationarity before modelling
Use minimum differencing needed — don't over-difference
Naïve forecast = essential benchmark for any model
Moving averages: distinguish noise from signal before reacting
Residuals should look like DWN — if not, the model is incomplete

06 · Coming Up Next

Week 4: Exponential Smoothing — a more sophisticated smoothing method that handles trend and seasonality
Holt-Winters model
Performance metrics: MAE, RMSE, MAPE
Business-oriented model evaluation

Understanding stationarity and differencing is the foundation for ARIMA (Week 7) — the d in ARIMA(p,d,q) is the order of differencing.