Differencing is a technique used to make a non-stationary time series stationary by removing trends and seasonality:
Stationarity is important because most time series forecasting methods assume that the data is stationary.
Test your time series data for stationarity using the Augmented Dickey-Fuller (ADF) test.
ADF Test Statistic: -
p-value: -
Is Stationary? -
A time series is stationary if its statistical properties (mean, variance, autocorrelation) do not change over time. Most time series models assume stationarity, so we often need to transform non-stationary data.
The ADF test checks for a unit root in the time series. The presence of a unit root indicates non-stationarity.
Null Hypothesis (H₀): The time series has a unit root (non-stationary)
Alternative Hypothesis (H₁): The time series does not have a unit root (stationary)
If the p-value is less than the significance level (typically 0.05), we reject the null hypothesis and conclude that the time series is stationary.
ADF test regression model:
The test statistic is the t-statistic for the \(\gamma\) coefficient. If \(\gamma\) is significantly less than zero, the series is stationary.
ARIMA (AutoRegressive Integrated Moving Average) models are used for forecasting time series data. They combine three components:
SARIMA (Seasonal ARIMA) models add seasonal components to ARIMA models.
A SARIMA model is denoted as SARIMA(p,d,q)(P,D,Q)s, where:
The general form of a SARIMA model is:
Where:
Compare the forecasting performance of different time series models.
| Model | RMSE | MAE | MAPE (%) |
|---|---|---|---|
| ARIMA | - | - | - |
| SARIMA | - | - | - |
| Holt-Winters | - | - | - |
To compare time series forecasting models, we use several metrics:
Root Mean Square Error (RMSE): Measures the square root of the average squared differences between predicted and actual values.
Mean Absolute Error (MAE): Measures the average absolute differences between predicted and actual values.
Mean Absolute Percentage Error (MAPE): Measures the average percentage difference between predicted and actual values.
Lower values of these metrics indicate better model performance.