Understanding stationarity, differencing, and seasonal patterns
Aim: Reduce the impact of random variation from time series data to identify underlying patterns more clearly.
These techniques produce a modified time series plot that is smoother and less irregular than the original.
Real-world interpretation: Today's temperature depends on yesterday's temperature plus some random variation.
Current value depends on previous values in the series
For AR(1): -1 < α < 1
| Month | Passengers | Previous Month | Change |
|---|---|---|---|
| 2010-01 | 112 | - | - |
| 2010-02 | 118 | 112 | +6 |
| 2010-03 | 132 | 118 | +14 |
| 2010-04 | 129 | 132 | -3 |
In AR model, we predict current month's passengers based on previous month(s), accounting for the relationship strength (α coefficient).
Looks at previous values in the time series (like predicting weather based on past few days)
Considers past errors to improve predictions (learning from past mistakes)
Think of ARMA: Guessing what happens next by combining patterns from the past (AR) with corrections from past mistakes (MA)
AR part: This month's sales depend on last month's sales
MA part: Adjust prediction based on how wrong we were last month
| Month | Actual Sales | Predicted | Error |
|---|---|---|---|
| Jan | $100K | $95K | +$5K |
| Feb | $110K | $105K | +$5K |
| Mar | ? | AR: $108K MA: +$2.5K Total: $110.5K | - |
The correlogram shows correlation between y at time t and y at time t-k, where k is the lag.
PACF shows the direct correlation at each lag, removing the effect of intermediate lags.
Shows total correlation including indirect effects through intermediate lags
Shows only direct correlation, controlling for intermediate values
Analogy: ACF is like total influence of a grandparent on grandchild (direct + through parent). PACF is only the direct grandparent influence.
PACF: Cuts off after lag p
ACF: Decays gradually
Example: PACF significant at lags 1,2 then cuts off → AR(2)
ACF: Cuts off after lag q
PACF: Decays gradually
Example: ACF significant at lags 1,2,3 then cuts off → MA(3)
Key observations from typical AR(2) plots:
Real Application: Stock prices where today's price depends on yesterday's and day-before-yesterday's prices, but not directly on older prices.
We'll build from stationarity concepts up to advanced seasonal models
Purpose: Transform non-stationary data into stationary data by removing trends
| Day | Price ($) | 1st Diff | 2nd Diff |
|---|---|---|---|
| 1 | 100 | - | - |
| 2 | 102 | +2 | - |
| 3 | 105 | +3 | +1 |
| 4 | 109 | +4 | +1 |
| 5 | 114 | +5 | +1 |
Observation: Original prices show strong upward trend. First differences still trending up. Second differences are stationary (constant +1).
Simple Rule: If your data has a trend, try first differencing. If it still has a trend, try second differencing. Most real data needs at most 2 orders of differencing.
Question 1: Is this time series stationary?
| Original | 1st Difference | 2nd Difference |
|---|---|---|
| 10 | - | - |
| 15 | 5 | - |
| 21 | 6 | 1 |
| 30 | 9 | 3 |
| 45 | 15 | 6 |
| 66 | 21 | 6 |
| 91 | 25 | 4 |
Analysis: First differences still show upward trend (5,6,9,15,21,25). Second differences are more stable but still not perfectly stationary.
Autoregressive terms - how many past values to include
Integration (differencing) - how many times to difference the data
Moving Average terms - how many past errors to include
Key Insight: ARIMA = ARMA applied to differenced (stationary) data
Requirement: Data must already be stationary
Example: AR(2) for temperature fluctuations around seasonal mean
Requirement: Data must already be stationary
Example: ARMA(1,1) for detrended stock returns
Advantage: Can handle non-stationary data by including differencing step
Example: ARIMA(1,1,1) for stock prices with trend - difference once to remove trend, then apply ARMA(1,1)
Mathematical Form:
(1 - α₁L - α₂L² - ... - αₚLᵖ)(1-L)ᵈYₜ = (1 + β₁L + β₂L² + ... + βₑLᵠ)εₜ
Where L is the lag operator and d is the degree of differencing
Step 1: Original stock prices have upward trend (non-stationary)
Step 2: Take first difference → daily price changes (stationary)
Step 3: Model daily changes using ARMA(1,1):
Step 4: To forecast prices, predict changes then add to last known price
Try several combinations and choose the one with best fit and lowest AIC/BIC
p,d,q: Handle short-term patterns
Same as regular ARIMA
P,D,Q: Handle seasonal patterns
s: Seasonal period (12 for monthly, 4 for quarterly)
When to Use: Data shows both short-term correlations AND seasonal patterns
Key Question: Does the data show patterns that repeat every s time periods?
Regular part (1,1,1):
Seasonal part (1,1,1)₁₂:
Result: Model captures both month-to-month changes AND year-to-year seasonal patterns
Observed Patterns:
SARIMA Approach:
• Remove trend with regular differencing (d=1)
• Remove seasonal pattern with seasonal differencing (D=1, s=12)
• Model remaining correlations with AR/MA terms
s: Known from data frequency
P,D,Q: Use ACF/PACF at seasonal lags
p,d,q: Same process as regular ARIMA
Common Starting Points: SARIMA(1,1,1)(1,1,1)s or SARIMA(0,1,1)(0,1,1)s
Before fitting ARIMA models, we need to answer: "How many times should we difference our data to make it stationary?"
A time series has a unit root if shocks have permanent effects (non-stationary)
Series is non-stationary
Need to difference the data
Series is stationary
Can use levels in modeling
Null Hypothesis (H₀): β = 0 (unit root exists, series is non-stationary)
Alternative Hypothesis (H₁): β < 0 (no unit root, series is stationary)
Decision Rule:
• If p-value < 0.05: Reject H₀ → Series is stationary
• If p-value > 0.05: Fail to reject H₀ → Series has unit root
| Series | ADF Statistic | p-value | Conclusion |
|---|---|---|---|
| Stock Prices | -1.45 | 0.56 | Non-stationary (has trend) |
| Stock Returns | -8.92 | 0.00 | Stationary |
| Temperature | -2.85 | 0.05 | Borderline (check further) |
Practical Steps:
1. Run ADF test on original data
2. If p > 0.05, take first difference and test again
3. If still p > 0.05, take second difference
4. Use the differenced data that passes the test
Use multiple tests and look for consistent results. If ADF says non-stationary and KPSS agrees, you can be confident about differencing.
Practice with real datasets, experiment with parameter selection, and compare different models using information criteria.
Week 7: Advanced ARIMA applications, model diagnostics, and forecasting evaluation
Remember to practice with the provided datasets and explore the Activity 9 ARIMA notebook!