1 / 42

DATA4400

Week 6: Advanced Time Series Modeling

ARIMA & SARIMA Models

Understanding stationarity, differencing, and seasonal patterns

Today's Agenda

Recap: Smoothing Techniques

What are Smoothing Techniques?

Aim: Reduce the impact of random variation from time series data to identify underlying patterns more clearly.

These techniques produce a modified time series plot that is smoother and less irregular than the original.

Smoothing Technique 1: Moving Average

Simple Moving Average

MA(n) = (Yt + Yt-1 + ... + Yt-n+1) / n
Purpose: Smooths short-term fluctuations to reveal trends
Effect: Each data point is replaced by average of surrounding points
Trade-off: Larger windows = smoother but less responsive
Use Case: Identifying long-term trends in noisy data

Smoothing Technique 2: Exponential Smoothing

Exponential Smoothing Formula

St = αYt + (1-α)St-1
α (alpha): Smoothing parameter between 0 and 1
Weight Pattern: More weight to recent observations
Advantage: Responds quickly to changes while smoothing
Flexibility: α controls responsiveness vs. smoothness

Quiz: Smoothing Techniques

Which smoothing technique gives more weight to recent observations?
A) Simple Moving Average
B) Centered Moving Average
C) Exponential Smoothing
D) All give equal weights

Recap: Autoregressive (AR) Model

AR(1) Model Example

Yt = μ + α(Yt-1 - μ) + εt

Real-world interpretation: Today's temperature depends on yesterday's temperature plus some random variation.

Key Characteristic

Current value depends on previous values in the series

Stationarity Condition

For AR(1): -1 < α < 1

AR Model: Data-Driven Example

Air Passengers Data (Monthly)

MonthPassengersPrevious MonthChange
2010-01112--
2010-02118112+6
2010-03132118+14
2010-04129132-3

In AR model, we predict current month's passengers based on previous month(s), accounting for the relationship strength (α coefficient).

Recap: ARMA Model

ARMA combines AR + MA components

Yt - μ = α₁(Yt-1 - μ) + ... + αp(Yt-p - μ) + εt + β₁εt-1 + ... + βqεt-q

AR Component

Looks at previous values in the time series (like predicting weather based on past few days)

MA Component

Considers past errors to improve predictions (learning from past mistakes)

Think of ARMA: Guessing what happens next by combining patterns from the past (AR) with corrections from past mistakes (MA)

ARMA: Data-Driven Example

Retail Sales Forecasting

AR part: This month's sales depend on last month's sales

MA part: Adjust prediction based on how wrong we were last month

MonthActual SalesPredictedError
Jan$100K$95K+$5K
Feb$110K$105K+$5K
Mar?AR: $108K
MA: +$2.5K
Total: $110.5K
-

Quiz: AR vs ARMA

What does the MA component in ARMA models primarily help with?
A) Looking at more historical data points
B) Learning from past forecasting errors
C) Increasing the model complexity
D) Handling seasonal patterns

Autocorrelation Function (ACF)

What is ACF?

The correlogram shows correlation between y at time t and y at time t-k, where k is the lag.

r(k) = Σ(yt - ȳ)(yt-k - ȳ) / Σ(yt - ȳ)²
Purpose: Measure how current values relate to past values
Range: Values between -1 and +1
Lag k: Time periods back (1, 2, 3, ... typically up to 20)
Use: Guide for choosing ARMA model order

Partial Autocorrelation Function (PACF)

Understanding PACF

PACF shows the direct correlation at each lag, removing the effect of intermediate lags.

ACF

Shows total correlation including indirect effects through intermediate lags

PACF

Shows only direct correlation, controlling for intermediate values

Analogy: ACF is like total influence of a grandparent on grandchild (direct + through parent). PACF is only the direct grandparent influence.

Identifying Models with ACF & PACF

AR(p) Model

PACF: Cuts off after lag p

ACF: Decays gradually

Example: PACF significant at lags 1,2 then cuts off → AR(2)

MA(q) Model

ACF: Cuts off after lag q

PACF: Decays gradually

Example: ACF significant at lags 1,2,3 then cuts off → MA(3)

ACF Plot (AR(2) Process)
PACF Plot (AR(2) Process)

ACF & PACF: Data Example

AR(2) Process Pattern

Key observations from typical AR(2) plots:

  • PACF: Strong correlation at lag 1 and lag 2, then cuts off
  • ACF: Gradually decaying pattern
  • Interpretation: Current value depends directly on two previous values

Real Application: Stock prices where today's price depends on yesterday's and day-before-yesterday's prices, but not directly on older prices.

Quiz: ACF & PACF

If PACF cuts off after lag 2 and ACF decays gradually, what model is suggested?
A) MA(2)
B) AR(2)
C) ARMA(1,1)
D) ARIMA(2,1,0)

Today's Main Focus

Unit Root Testing
SARIMA Models
ARIMA Models
Differencing & Stationarity

We'll build from stationarity concepts up to advanced seasonal models

Understanding Differencing

Differencing: Removing Trends from Data

First Order Difference: ΔYt = Yt - Yt-1
Second Order Difference: Δ²Yt = ΔYt - ΔYt-1

Purpose: Transform non-stationary data into stationary data by removing trends

Differencing: Step-by-Step Example

Stock Price Example

DayPrice ($)1st Diff2nd Diff
1100--
2102+2-
3105+3+1
4109+4+1
5114+5+1
Visual Differencing Process

Observation: Original prices show strong upward trend. First differences still trending up. Second differences are stationary (constant +1).

Why Differencing is Crucial

Before: Non-Stationary (Trending)
After: Stationary (Differenced)

Non-Stationary Problems

  • Mean changes over time
  • Variance may increase
  • Relationships are unstable
  • Predictions become unreliable

After Differencing Benefits

  • Stable mean around zero
  • Consistent variance
  • Reliable model parameters
  • Better forecasting accuracy

Simple Rule: If your data has a trend, try first differencing. If it still has a trend, try second differencing. Most real data needs at most 2 orders of differencing.

Quiz: Differencing Practice

Given time series: 10, 15, 21, 30, 45, 66, 91

Question 1: Is this time series stationary?

A) Yes, it's stationary
B) No, it shows a clear upward trend
C) Cannot determine from this data
D) Only the variance is non-stationary

Differencing Practice (Continued)

Let's Apply First Differencing

Original1st Difference2nd Difference
10--
155-
2161
3093
45156
66216
91254

Analysis: First differences still show upward trend (5,6,9,15,21,25). Second differences are more stable but still not perfectly stationary.

ARIMA: The Complete Model

ARIMA(p,d,q) = AR + I + MA

AR (p)

Autoregressive terms - how many past values to include

I (d)

Integration (differencing) - how many times to difference the data

MA (q)

Moving Average terms - how many past errors to include

Key Insight: ARIMA = ARMA applied to differenced (stationary) data

ARIMA vs AR and ARMA

AR Model

Requirement: Data must already be stationary

Example: AR(2) for temperature fluctuations around seasonal mean

ARMA Model

Requirement: Data must already be stationary

Example: ARMA(1,1) for detrended stock returns

ARIMA Model

Advantage: Can handle non-stationary data by including differencing step

Example: ARIMA(1,1,1) for stock prices with trend - difference once to remove trend, then apply ARMA(1,1)

ARIMA Model: The Process

1. Check if data is stationary
2. If not, apply d differences
3. Fit ARMA(p,q) to differenced data
4. Use model for forecasting

Mathematical Form:

(1 - α₁L - α₂L² - ... - αₚLᵖ)(1-L)ᵈYₜ = (1 + β₁L + β₂L² + ... + βₑLᵠ)εₜ

Where L is the lag operator and d is the degree of differencing

ARIMA Model: Simple Interpretation

ARIMA(1,1,1) Process: Stock Prices

ARIMA(1,1,1) Example: Stock Prices

Step 1: Original stock prices have upward trend (non-stationary)

Step 2: Take first difference → daily price changes (stationary)

Step 3: Model daily changes using ARMA(1,1):

  • Today's change depends on yesterday's change (AR part)
  • Plus adjustment based on yesterday's forecast error (MA part)

Step 4: To forecast prices, predict changes then add to last known price

How to Choose p, d, q Parameters

Choosing d (Integration Order)

  • Plot the data - look for trends
  • Apply unit root tests
  • Start with d=1 for trending data
  • Most economic data needs d=1 or d=2

Choosing p and q

  • Use ACF/PACF plots on differenced data
  • PACF cuts off at lag p → AR(p)
  • ACF cuts off at lag q → MA(q)
  • Information criteria (AIC, BIC)

Practical Approach

Try several combinations and choose the one with best fit and lowest AIC/BIC

Quiz: ARIMA Models

What does the 'd' parameter in ARIMA(p,d,q) represent?
A) The number of autoregressive terms
B) The degree of differencing needed to make data stationary
C) The number of moving average terms
D) The seasonal period length

Seasonal ARIMA (SARIMA)

SARIMA(p,d,q)(P,D,Q)s

Regular ARIMA Part

p,d,q: Handle short-term patterns

Same as regular ARIMA

Seasonal Part

P,D,Q: Handle seasonal patterns

s: Seasonal period (12 for monthly, 4 for quarterly)

When to Use: Data shows both short-term correlations AND seasonal patterns

SARIMA vs ARIMA: When to Use Each?

Use Regular ARIMA When:

  • No obvious seasonal patterns
  • Monthly/quarterly data without seasonal cycle
  • Financial returns (typically no seasonality)
  • Daily data over short periods

Use SARIMA When:

  • Clear seasonal patterns (monthly, quarterly)
  • Retail sales (holiday effects)
  • Temperature data
  • Tourism, agriculture data

Key Question: Does the data show patterns that repeat every s time periods?

SARIMA: Detailed Example

Monthly Retail Sales: SARIMA(1,1,1)(1,1,1)₁₂

Regular part (1,1,1):

  • First difference to remove trend
  • Current month depends on previous month
  • Adjust for last month's forecast error

Seasonal part (1,1,1)₁₂:

  • First seasonal difference (Jan 2024 - Jan 2023)
  • Current January depends on previous January
  • Adjust for last January's seasonal error

Result: Model captures both month-to-month changes AND year-to-year seasonal patterns

SARIMA: Visual Understanding

Monthly Air Passengers: Trend + Seasonality

Air Passengers Data Pattern

Observed Patterns:

  • Trend: Overall increase in passengers over time
  • Seasonality: Peak in summer months (June-Aug), low in winter
  • Month-to-Month: Some correlation between consecutive months

SARIMA Approach:

• Remove trend with regular differencing (d=1)

• Remove seasonal pattern with seasonal differencing (D=1, s=12)

• Model remaining correlations with AR/MA terms

How to Choose P,D,Q,p,d,q,s Parameters?

Seasonal Parameters

s: Known from data frequency

  • Monthly data: s=12
  • Quarterly data: s=4
  • Daily data: s=7 or s=365

P,D,Q: Use ACF/PACF at seasonal lags

Regular Parameters

p,d,q: Same process as regular ARIMA

  • Plot differenced data
  • Check ACF/PACF patterns
  • Start with simple models
  • Use information criteria

Common Starting Points: SARIMA(1,1,1)(1,1,1)s or SARIMA(0,1,1)(0,1,1)s

Unit Root Testing: Why Do We Need It?

The Stationarity Question

Before fitting ARIMA models, we need to answer: "How many times should we difference our data to make it stationary?"

Visual Inspection Problems

  • Subjective interpretation
  • Difficult with noisy data
  • May miss subtle trends
  • Not reliable for complex patterns

Unit Root Tests Benefits

  • Objective statistical test
  • Clear p-value interpretation
  • Handles complex cases
  • Industry standard approach

Unit Root Tests: The Concept

What is a Unit Root?

A time series has a unit root if shocks have permanent effects (non-stationary)

Test equation: ΔYt = α + βYt-1 + error

Unit Root Present (β = 0)

Series is non-stationary

Need to difference the data

No Unit Root (β < 0)

Series is stationary

Can use levels in modeling

Augmented Dickey-Fuller (ADF) Test

Most Common Unit Root Test

ADF: ΔYt = α + βYt-1 + γ₁ΔYt-1 + ... + γₚΔYt-p + εᵗ

Null Hypothesis (H₀): β = 0 (unit root exists, series is non-stationary)

Alternative Hypothesis (H₁): β < 0 (no unit root, series is stationary)

ADF Test Distribution

Decision Rule:

• If p-value < 0.05: Reject H₀ → Series is stationary

• If p-value > 0.05: Fail to reject H₀ → Series has unit root

Interpreting ADF Test Results

Example Results

SeriesADF Statisticp-valueConclusion
Stock Prices-1.450.56Non-stationary (has trend)
Stock Returns-8.920.00Stationary
Temperature-2.850.05Borderline (check further)

Practical Steps:

1. Run ADF test on original data

2. If p > 0.05, take first difference and test again

3. If still p > 0.05, take second difference

4. Use the differenced data that passes the test

Other Unit Root Tests

Phillips-Perron (PP) Test

  • Similar to ADF but different corrections
  • Better for series with structural breaks
  • Non-parametric approach

KPSS Test

  • Opposite null hypothesis
  • H₀: Series is stationary
  • Good as complementary test

Best Practice Approach

Use multiple tests and look for consistent results. If ADF says non-stationary and KPSS agrees, you can be confident about differencing.

Unit Root Testing: Complete Workflow

1. Plot the data visually
2. Run ADF test on original series
3. If non-stationary, difference and retest
4. Confirm with additional tests (PP, KPSS)

Typical Results Pattern

  • Most economic series: Need d=1 (first difference)
  • Prices, GDP, stock indices: Usually I(1)
  • Returns, growth rates: Usually I(0) - already stationary
  • Few series need d=2: Be cautious, often indicates over-differencing

Final Quiz: Putting It All Together

You have monthly sales data showing upward trend and seasonal pattern. ADF test p-value = 0.8 on original data, p-value = 0.02 after first differencing. What model should you consider?
A) ARIMA(1,0,1) - data is already stationary
B) SARIMA(p,1,q)(P,1,Q)₁₂ - needs differencing and seasonal modeling
C) ARIMA(1,2,1) - needs second differencing
D) Simple exponential smoothing

Week 6 Summary

Key Concepts Mastered

  • Smoothing: Moving averages and exponential smoothing reduce noise
  • AR/ARMA: Autoregressive and mixed models for stationary data
  • ACF/PACF: Tools to identify appropriate model orders
  • ARIMA: Extension to handle non-stationary data through differencing
  • SARIMA: Additional seasonal components for periodic patterns
  • Unit Root Tests: Objective statistical tests for stationarity

Next Steps

Practice with real datasets, experiment with parameter selection, and compare different models using information criteria.

Thank You!

Questions & Discussion

Next Week Preview

Week 7: Advanced ARIMA applications, model diagnostics, and forecasting evaluation

Remember to practice with the provided datasets and explore the Activity 9 ARIMA notebook!