This project presents a time series analysis focused on monthly electricity data from Spain. The dataset originates from Eurostat and is titled “Supply, transformation and consumption of electricity - monthly data” (DOI: 10.2908/NRG_CB_EM).
The main objectives of this project are:
Used libraries:
library(readxl)
library(dplyr)
library(forecast)
library(urca)
library(lmtest)
library(tseries)
library(astsa)
library(FinTS)
library(Metrics)
library(ggplot2)
Read the data from the Excel file.
data <- suppressMessages(read_excel("data.xlsx", col_names = FALSE))
df <- as.data.frame(t(data))
colnames(df) <- c("Date", "Spain")
df <- df %>%
filter(!is.na(Date)) %>%
slice(-1) %>%
mutate(Date = as.Date(paste(Date, "01", sep = "-"), format = "%Y-%m-%d"),
Spain = round(as.numeric(Spain), 3))
rownames(df) <- df$Date
df$Date <- NULL
head(df, 3)
## Spain
## 2008-01-01 24597
## 2008-02-01 23096
## 2008-03-01 21690
tail(df, 3)
## Spain
## 2024-12-01 19590.32
## 2025-01-01 20745.94
## 2025-02-01 18208.06
The dataset should be checked for any missing values:
any(is.na(df))
## [1] FALSE
# from 2008-1 to 2025-2
(12*(24-8+1) + 2) - nrow(df)
## [1] 0
Since there is no missing data, the variable can be converted to a time series object to enable proper temporal analysis and model fitting.
ts_data <- ts(df$Spain, start = c(2008, 1), frequency = 12)
Number of observations:
length(ts_data)
## [1] 206
Time series plot:
plot(ts_data,
ylab = "Electricity Consumption (GWh)",
xlab = "Time (Monthly)",
main = "Monthly Electricity Consumption in Spain")
Main observations from the time series plot:
Clear seasonal pattern with regular annual cycles
Variance appears stable (no visible heteroscedasticity)
Mild downward trend over time
No visible level shifts or structural breaks
For the remaining exploratory analysis, the dataset should already be split into training and test sets to ensure that model diagnostics are performed only on the training portion.
\[20\% \text{ of } 206 \approx 40\]
n <- length(ts_data)
# train: end before last 40 obs
ts_train <- window(ts_data, end = time(ts_data)[n - 40])
# test: start at the last 40 obs
ts_test <- window(ts_data, start = time(ts_data)[n - 39])
cat("Train data has", length(ts_train), "obervations.\n")
## Train data has 166 obervations.
cat("Test data has", length(ts_test), "obervations.\n")
## Test data has 40 obervations.
cat("Split proportion:", round(100 * length(ts_train) / n, 2),
"/", round(100 * length(ts_test) / n, 2), "\n")
## Split proportion: 80.58 / 19.42
cat("Train starts at", start(ts_train), "and ends at", end(ts_train), "\n")
## Train starts at 2008 1 and ends at 2021 10
cat("Test starts at", start(ts_test), "and ends at", end(ts_test), "\n")
## Test starts at 2021 11 and ends at 2025 2
A key consideration before modeling is whether the series should be transformed. Common transformations include logarithmic scaling or differencing to stabilize variance or achieve stationarity. This decision will be guided by visual inspection and formal statistical tests.
Using the Breusch-Pagan test:
\[ \begin{aligned} \text{H}_0\!:& \quad \text{Constant variance (homoscedasticity)} \\ \text{H}_1\!:& \quad \text{Non-constant variance (heteroscedasticity)} \end{aligned} \]
# prepare the data
df <- data.frame(
time = time(ts_train),
value = as.numeric(ts_train)
)
# apply the test
model <- lm(value ~ time, data = df)
bptest(model)
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.35006, df = 1, p-value = 0.5541
The Breusch-Pagan test yields a p-value of 0.5541. Since the p-value is substantially greater than conventional significance levels (e.g., 0.05), there is no evidence to reject the null hypothesis. The residuals do not exhibit significant heteroscedasticity; the assumption of constant variance holds.
Nonetheless, a Box-Cox transformation can still be applied to evaluate whether variance stabilization is noticeable or not.
par(mfrow = c(2, 1))
plot(df$value,
main = "Original Data",
ylab = "Original",
xlab = "Time",
type = "l")
lambda <- BoxCox.lambda(df$value)
plot(BoxCox(df$value, lambda),
main = paste("Box-Cox Transformation (λ =", round(lambda, 2), ")"),
ylab = "Transformed",
xlab = "Time",
type = "l")
Since the Box-Cox transformation does not produce a noticeable difference in the variance structure and the Breusch-Pagan test indicated homoscedasticity, no transformation will be applied to the original series.
As a first step, the ACF and PACF plots of the training series will be examined to assess temporal dependence and identify potential signs of non-stationarity, such as slow decay in autocorrelations. These plots help reveal if differencing may be required before model estimation.
tsdisplay(ts_train, main = "Time series diagnosis")
The time series plot and the ACF reveal strong autocorrelations that decay slowly, indicating the presence of persistent structure over time. The PACF also shows significant early lags. These patterns are characteristic of a non-stationary series, suggesting that differencing is likely required before fitting a model.
tsdisplay(diff(ts_train, lag = 12), main = "Diagnosis after seasonal differencing (lag = 12)")
tsdisplay(diff(diff(ts_train), lag = 12), main = "Seasonal differencing + first differencing")
After applying both seasonal differencing (lag = 12) and first differencing, the transformed series appears substantially more stable. The ACF shows a rapid drop-off, and most lags lie within the confidence bounds. The PACF displays limited significant spikes, further supporting stationarity. These patterns suggest that the combined differencing successfully removed trend and seasonal components, resulting in a stationary series.
To draw a more robust inference regarding stationarity, formal statistical tests should be applied in addition to visual inspection. Specifically, the Augmented Dickey-Fuller (ADF) and KPSS tests will be used to assess the presence or absence of unit roots.
Augmented Dickey-Fuller (ADF) test:
\[ \begin{aligned} \text{H}_0\!:& \quad \text{The series has a unit root (non-stationary)} \\ \text{H}_1\!:& \quad \text{The series is stationary} \end{aligned} \]
# test the original series, probably non-stationary
adf.test(ts_train)
##
## Augmented Dickey-Fuller Test
##
## data: ts_train
## Dickey-Fuller = -3.0043, Lag order = 5, p-value = 0.1572
## alternative hypothesis: stationary
# test after first differencing and seasonal differencing
adf.test(diff(diff(ts_train, lag = 12), differences = 1))
## Warning in adf.test(diff(diff(ts_train, lag = 12), differences = 1)): p-value
## smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(diff(ts_train, lag = 12), differences = 1)
## Dickey-Fuller = -6.9048, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
The ADF test on the original series returns a p-value of 0.1572. This is above the conventional threshold (e.g., 0.05), so there is insufficient evidence to reject the null hypothesis. The original series is likely non-stationary.
After applying seasonal and first differencing, the ADF test yields a p-value less than 0.01. This provides strong evidence against the null hypothesis, indicating that the differenced series is stationary.
KPSS test:
\[ \begin{aligned} \text{H}_0\!:& \quad \text{The series is stationary} \\ \text{H}_1\!:& \quad \text{The series has a unit root (non-stationary)} \end{aligned} \]
kpss_test <- ur.kpss(diff(diff(ts_train, lag = 12)))
summary(kpss_test)
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0199
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The KPSS test was performed on the differenced series (type = “mu”, with 4 lags). The test statistic is 0.0199, which is well below all critical values at common significance levels (e.g., 0.463 at 5%). Since the test statistic does not exceed the critical values, the null hypothesis of stationarity cannot be rejected. This test supports the conclusion that the differenced series is stationary, reinforcing the result from the ADF test.
This concludes that the original series is non-stationary, but stationarity is achieved after applying one seasonal and one regular differencing.
From the exploratory analysis section, it is already established that the series exhibits strong seasonality and required both seasonal and non-seasonal differencing to achieve stationarity. This justifies the use of a SARIMA model, which explicitly accounts for both seasonal and non-seasonal components.
Model identification will proceed by analyzing the ACF and PACF of the differenced series and by evaluating alternative specifications using information criteria (e.g., AICc) and residual diagnostics.
acf2(diff(diff(ts_train, 12)))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.26 -0.07 -0.09 -0.05 -0.04 0.06 -0.06 0.09 0.17 -0.13 0.04 -0.41
## PACF -0.26 -0.14 -0.16 -0.15 -0.15 -0.04 -0.12 0.00 0.19 -0.01 0.08 -0.42
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.04 0.20 0.10 -0.07 0.04 0.01 -0.02 -0.05 -0.11 0.06 0.16 0.01
## PACF -0.28 -0.01 0.07 -0.03 -0.07 0.04 0.03 0.00 0.01 -0.08 0.06 -0.16
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF -0.09 -0.09 -0.06 0.00 0.10 -0.08 0.01 0.11 0.03 -0.11 0.03 -0.08
## PACF -0.20 -0.10 -0.05 -0.15 -0.03 -0.04 -0.09 -0.04 0.03 -0.10 0.10 -0.13
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.12 0.12 -0.03 -0.05 0.03 -0.06 -0.01 -0.03 -0.04 0.10 -0.01 -0.03
## PACF -0.13 0.04 0.10 -0.03 0.08 0.02 -0.05 -0.02 0.04 -0.09 -0.03 -0.10
From the ACF and PACF plots of the seasonally and first-differenced
series (diff(diff(ts_train, 12))
), the following model
components can be inferred:
Seasonal components (period = 12)
Non-seasonal components
SARIMA(1,1,1)(1,1,1)[12] is a reasonable model candidate based on these diagnostics.
fit <- auto.arima(ts_train,
ic = "aicc",
seasonal = TRUE,
stepwise = TRUE,
approximation = TRUE)
summary(fit)
## Series: ts_train
## ARIMA(2,0,2)(2,1,0)[12] with drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2 drift
## 0.4258 0.3448 0.0208 -0.1976 -0.6338 -0.2803 -19.5295
## s.e. 1.0934 0.8809 1.0919 0.3903 0.0846 0.0837 9.3086
##
## sigma^2 = 549015: log likelihood = -1235.41
## AIC=2486.82 AICc=2487.82 BIC=2511.12
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.105525 697.2632 501.8618 -0.1185414 2.429774 0.6088613
## ACF1
## Training set 0.005366936
SARIMA(2,0,2)(2,1,0)[12] with drift is also a reasonable model
candidate based on auto.arima
.
fit <- auto.arima(ts_train,
ic = "aicc",
max.p = 5,
d = 1,
max.q = 5,
max.P = 3,
D = 1,
max.Q = 3,
seasonal = TRUE,
stepwise = FALSE,
approximation = FALSE)
summary(fit)
## Series: ts_train
## ARIMA(0,1,2)(3,1,0)[12]
##
## Coefficients:
## ma1 ma2 sar1 sar2 sar3
## -0.4815 -0.1470 -0.7172 -0.4395 -0.2032
## s.e. 0.0842 0.0906 0.0854 0.0981 0.0894
##
## sigma^2 = 547459: log likelihood = -1229.25
## AIC=2470.5 AICc=2471.07 BIC=2488.68
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 10.77535 698.6388 508.4503 -0.03317999 2.45886 0.6168546
## ACF1
## Training set 0.01111596
SARIMA(0,1,2)(3,1,0)[12] is also a reasonable model candidate based
on a more custom auto.arima
.
To identify the optimal SARIMA model, a grid search can also be
performed (a more custom auto.arima
), over a range of
seasonal and non-seasonal parameters. Each model is evaluated using
AICc, which balances fit and complexity. This approach allows for a
systematic comparison of candidate models, guided by penalized
likelihood. While it increases the risk of overfitting due to the large
number of combinations, safeguards are applied—such as restricting the
total number of parameters—to limit model complexity.
# full grid search
grid <- expand.grid(p = 0:5,
d = 0:2,
q = 0:5,
P = 0:5,
D = 1:3,
Q = 0:5)
# filtered grid search
grid <- subset(grid, (p + q + P + Q) <= 7)
# model fitting and evaluation
results <- lapply(seq_len(nrow(grid)), function(i) {
row <- grid[i, ]
model_str <- paste0("(", row$p, ",", row$d, ",", row$q, ")(",
row$P, ",", row$D, ",", row$Q, ")[12]")
if (i %% 100 == 0 || i == 1 || i == nrow(grid)) {
cat(sprintf("[STATUS] At model %d of %d: %s\n", i, nrow(grid), model_str))
}
tryCatch({
fit <- Arima(ts_train,
order = c(row$p, row$d, row$q),
seasonal = list(order = c(row$P, row$D, row$Q), period = 12),
include.drift = FALSE)
data.frame(aicc = fit$aicc, model = model_str)
}, error = function(e) {
cat(sprintf("[ERROR] In model %d of %d: %s -> %s\n", i, nrow(grid), model_str, e$message))
NULL
})
})
# remove failed models
results <- Filter(Negate(is.null), results)
# save the sorted results
results_df <- do.call(rbind, results)
results_df <- results_df[order(results_df$aicc), ]
write.csv(results_df, "results.csv", row.names = FALSE)
results_df <- read.csv("results.csv")
head(results_df, 3)
## aicc model
## 1 2224.473 (1,2,2)(0,3,3)[12]
## 2 2225.650 (1,2,3)(0,3,3)[12]
## 3 2225.682 (2,2,2)(0,3,3)[12]
Current candidates for the optimal SARIMA model:
SARIMA(1,1,1)(1,1,1)[12]
SARIMA(2,0,2)(2,1,0)[12]
SARIMA(0,1,2)(3,1,0)[12]
SARIMA(1,2,2)(0,3,3)[12]
models <- list(
list(order = c(1,1,1), seasonal = c(1,1,1)),
list(order = c(2,0,2), seasonal = c(2,1,0)),
list(order = c(0,1,2), seasonal = c(3,1,0)),
list(order = c(1,2,2), seasonal = c(0,3,3))
)
labels <- c("SARIMA(1,1,1)(1,1,1)[12]",
"SARIMA(2,0,2)(2,1,0)[12]",
"SARIMA(0,1,2)(3,1,0)[12]",
"SARIMA(1,2,2)(0,3,3)[12]")
plot(ts_train, type = "l", col = "black", lwd = 1.5,
main = "Fitted Values from SARIMA Models vs Training Data",
ylab = "Electricity Consumption", xlab = "Time")
colors <- c("blue", "red", "darkgreen", "purple")
for (i in seq_along(models)) {
fit <- Arima(ts_train,
order = models[[i]]$order,
seasonal = list(order = models[[i]]$seasonal, period = 12))
lines(fitted(fit), col = colors[i], lwd = 1.5)
}
legend("bottomleft",
legend = c("Observed", labels),
col = c("black", colors),
lty = 1,
lwd = 1.5)
Based on the visual alignment of fitted values with the observed training data and considering model complexity, the most favorable choices are:
SARIMA(1,1,1)(1,1,1)[12]
SARIMA(2,0,2)(2,1,0)[12]
SARIMA(0,1,2)(3,1,0)[12]
Evaluation of the First Choice: SARIMA(1,1,1)(1,1,1)[12]
sarima1 <- forecast::Arima(ts_train,
order = c(1,1,1),
seasonal = c(1,1,1))
summary(sarima1)
## Series: ts_train
## ARIMA(1,1,1)(1,1,1)[12]
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.4499 -0.8689 0.0579 -0.9983
## s.e. 0.1252 0.0830 0.0894 0.2415
##
## sigma^2 = 424234: log likelihood = -1221.8
## AIC=2453.6 AICc=2454.01 BIC=2468.75
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 10.45888 617.0807 448.1913 -0.02916893 2.165709 0.5437481
## ACF1
## Training set -0.03951362
# TRUE if significantly different from zero (all should be)
0.0579 / 0.0894 >= 1.96
## [1] FALSE
sar1
is not significant!
# correlation between parameters (should be <.7)
vcov_mat <- vcov(sarima1)
stddevs <- sqrt(diag(vcov_mat))
cov2cor(vcov_mat)
## ar1 ma1 sar1 sma1
## ar1 1.000000000 -0.81456826 0.14406044 -0.005351725
## ma1 -0.814568259 1.00000000 -0.14824210 0.004377140
## sar1 0.144060443 -0.14824210 1.00000000 -0.022012213
## sma1 -0.005351725 0.00437714 -0.02201221 1.000000000
checkresiduals(sarima1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)(1,1,1)[12]
## Q* = 31.304, df = 20, p-value = 0.0513
##
## Model df: 4. Total lags used: 24
cat("The residuals have mean", mean(residuals(sarima1)), "and variance", var(residuals(sarima1)))
## The residuals have mean 10.45888 and variance 382986.3
qqnorm(residuals(sarima1)); qqline(residuals(sarima1))
Evaluation of the Second Choice: SARIMA(2,0,2)(2,1,0)[12]
sarima2 <- forecast::Arima(ts_train,
order = c(2,0,2),
seasonal = c(2,1,0))
summary(sarima2)
## Series: ts_train
## ARIMA(2,0,2)(2,1,0)[12]
##
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
## ar1 ar2 ma1 ma2 sar1 sar2
## 0.8327 0.0659 -0.3693 -0.1036 -0.6332 -0.2798
## s.e. 0.0727 0.0940 0.0510 NaN NaN NaN
##
## sigma^2 = 556579: log likelihood = -1236.98
## AIC=2487.95 AICc=2488.72 BIC=2509.21
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -83.00719 704.4336 516.3341 -0.4976523 2.503219 0.6264192
## ACF1
## Training set -0.006377222
0.0659 / 0.0940 >= 1.96
## [1] FALSE
ar2
is not significant!
vcov_mat <- vcov(sarima2)
stddevs <- sqrt(diag(vcov_mat))
## Warning in sqrt(diag(vcov_mat)): NaNs produced
cov2cor(vcov_mat)
## Warning in cov2cor(vcov_mat): diag(V) had non-positive or NA entries; the
## non-finite result may be dubious
## ar1 ar2 ma1 ma2 sar1 sar2
## ar1 1.0000000 -0.9574975 0.3497048 NaN NaN NaN
## ar2 -0.9574975 1.0000000 -0.8478147 NaN NaN NaN
## ma1 0.3497048 -0.8478147 1.0000000 NaN NaN NaN
## ma2 NaN NaN NaN 1 NaN NaN
## sar1 NaN NaN NaN NaN 1 NaN
## sar2 NaN NaN NaN NaN NaN 1
checkresiduals(sarima2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(2,1,0)[12]
## Q* = 48.363, df = 18, p-value = 0.0001329
##
## Model df: 6. Total lags used: 24
cat("The residuals have mean", mean(residuals(sarima2)), "and variance", var(residuals(sarima2)))
## The residuals have mean -83.00719 and variance 492302.2
qqnorm(residuals(sarima2)); qqline(residuals(sarima2))
Evaluation of the Third Choice: SARIMA(0,1,2)(3,1,0)[12]
sarima3 <- forecast::Arima(ts_train,
order = c(0,1,2),
seasonal = c(3,1,0))
summary(sarima3)
## Series: ts_train
## ARIMA(0,1,2)(3,1,0)[12]
##
## Coefficients:
## ma1 ma2 sar1 sar2 sar3
## -0.4815 -0.1470 -0.7172 -0.4395 -0.2032
## s.e. 0.0842 0.0906 0.0854 0.0981 0.0894
##
## sigma^2 = 547459: log likelihood = -1229.25
## AIC=2470.5 AICc=2471.07 BIC=2488.68
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 10.77535 698.6388 508.4503 -0.03317999 2.45886 0.6168546
## ACF1
## Training set 0.01111596
They are all significant!
vcov_mat <- vcov(sarima3)
stddevs <- sqrt(diag(vcov_mat))
cov2cor(vcov_mat)
## ma1 ma2 sar1 sar2 sar3
## ma1 1.0000000 -0.31133605 0.1535299 0.08440160 -0.21314672
## ma2 -0.3113361 1.00000000 -0.1135561 0.06847138 0.03202671
## sar1 0.1535299 -0.11355608 1.0000000 0.57568805 0.29470067
## sar2 0.0844016 0.06847138 0.5756881 1.00000000 0.52754763
## sar3 -0.2131467 0.03202671 0.2947007 0.52754763 1.00000000
No significant correlation.
checkresiduals(sarima3)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)(3,1,0)[12]
## Q* = 44.752, df = 19, p-value = 0.0007424
##
## Model df: 5. Total lags used: 24
cat("The residuals have mean", mean(residuals(sarima3)), "and variance", var(residuals(sarima3)))
## The residuals have mean 10.77535 and variance 490937.5
qqnorm(residuals(sarima3)); qqline(residuals(sarima3))
Since no model fully satisfied the diagnostic criteria for a well-specified model (namely, uncorrelated, homoscedastic, and normally distributed residuals), further tuning is necessary.
After extensive testing of alternative SARIMA specifications and parameter adjustments, this was the best-performing configuration identified based on diagnostic criteria and overall model adequacy.
fit_arima <- forecast::Arima(ts_train,
order = c(1,1,1),
seasonal = c(1,1,1))
summary(fit_arima)
## Series: ts_train
## ARIMA(1,1,1)(1,1,1)[12]
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.4499 -0.8689 0.0579 -0.9983
## s.e. 0.1252 0.0830 0.0894 0.2415
##
## sigma^2 = 424234: log likelihood = -1221.8
## AIC=2453.6 AICc=2454.01 BIC=2468.75
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 10.45888 617.0807 448.1913 -0.02916893 2.165709 0.5437481
## ACF1
## Training set -0.03951362
vcov_mat <- vcov(fit_arima)
stddevs <- sqrt(diag(vcov_mat))
cov2cor(vcov_mat)
## ar1 ma1 sar1 sma1
## ar1 1.000000000 -0.81456826 0.14406044 -0.005351725
## ma1 -0.814568259 1.00000000 -0.14824210 0.004377140
## sar1 0.144060443 -0.14824210 1.00000000 -0.022012213
## sma1 -0.005351725 0.00437714 -0.02201221 1.000000000
checkresiduals(fit_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)(1,1,1)[12]
## Q* = 31.304, df = 20, p-value = 0.0513
##
## Model df: 4. Total lags used: 24
cat("The residuals have mean", mean(residuals(fit_arima)), "and variance", var(residuals(fit_arima)))
## The residuals have mean 10.45888 and variance 382986.3
qqnorm(residuals(fit_arima)); qqline(residuals(fit_arima))
# Shapiro-Wilk test
shapiro.test(residuals(fit_arima))
##
## Shapiro-Wilk normality test
##
## data: residuals(fit_arima)
## W = 0.97568, p-value = 0.005097
# Kolmogorov–Smirnov test
ks.test(residuals(fit_arima), "pnorm", mean=mean(residuals(fit_arima)), sd=sd(residuals(fit_arima)))
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: residuals(fit_arima)
## D = 0.083543, p-value = 0.1969
## alternative hypothesis: two-sided
Even though the sar1
coefficient is individually
insignificant and ar1
shows correlation with
ma1
, both contribute meaningfully to whitening the
residuals. Their inclusion reduces autocorrelation not accounted for by
other components. In time series modeling, residual diagnostics take
precedence over individual parameter significance, and retaining such
terms is justified when they improve the overall model adequacy.
The residual diagnostics for the model show reasonably good behavior:
The residuals appear centered with no visible trend or structural shift.
The ACF plot shows no significant autocorrelation at most lags.
The histogram indicates approximate symmetry and bell-shaped distribution.
The Ljung-Box test result is:
\[ \begin{aligned} \text{H}_0\!:& \quad \text{No autocorrelation in residuals} \\ \text{H}_1\!:& \quad \text{Residuals are autocorrelated} \end{aligned} \]
p-value is 0.0513, slightly above the 5% significance level, indicating marginal evidence against the null hypothesis. While not entirely conclusive, this suggests that the residuals are likely uncorrelated, or at least that any remaining autocorrelation is weak.
Normality of the residuals was evaluated using both the Shapiro-Wilk and Kolmogorov-Smirnov tests, along with visual inspection via the Q-Q plot.
\[ \begin{aligned} \text{H}_0\!:& \quad \text{Residuals are normally distributed} \\ \text{H}_1\!:& \quad \text{Residuals deviate from normality} \end{aligned} \]
p-value = 0.005097: Rej. H0: Some evidence of non-normality.
\[ \begin{aligned} \text{H}_0\!:& \quad \text{Residuals follow the normal distribution} \\ \text{H}_1\!:& \quad \text{Residuals do not follow the normal distribution} \end{aligned} \]
p-value = 0.1969: Fail to reject H0: no significant evidence of non-normality.
This difference arises because the Shapiro-Wilk test is more robust against non-normality, particularly for skewed distributions and in small samples. The Kolmogorov-Smirnov test, on the other hand, is more conservative and may fail to reject the null hypothesis for non-normal data, especially in small samples.
Shows tail deviations but an overall “acceptable” alignment with the theoretical normal distribution.
While the Shapiro-Wilk test suggests a deviation from normality, the K-S test and graphical analysis “support” the assumption of approximate normality.
The SARIMA(1,1,1)(1,1,1)[12] model produces residuals that are close enough to white noise, with weak autocorrelation and only mild deviations from normality. Although not perfect, the model satisfies most diagnostic criteria to a reasonable extent. In the absence of clearly superior alternatives, this model may be considered adequate for forecasting purposes.
Having identified the best-fitting ARIMA model, the next step is to test whether a GARCH model is appropriate for capturing any remaining conditional heteroscedasticity in the residuals.
\[ \begin{aligned} \text{H}_0\!:& \quad \text{No ARCH effects (homoscedasticity)} \\ \text{H}_1\!:& \quad \text{Presence of ARCH effects (heteroscedasticity)} \end{aligned} \]
ArchTest(residuals(fit_arima), lags = 12)
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: residuals(fit_arima)
## Chi-squared = 9.7974, df = 12, p-value = 0.6337
The test returned a p-value of 0.6337. As the p-value is well above common significance levels, we fail to reject the null hypothesis. There is no evidence of ARCH effects in the residuals; a GARCH extension is not required.
An ETS model will now be applied. ETS models are well-suited for time series with strong trend and seasonal components, and unlike ARIMA, they do not require stationarity. This alternative approach will allow for comparison in terms of both in-sample fit and forecasting accuracy. Model components (Error, Trend, Seasonal) are selected automatically based on information criteria.
fit_ets <- ets(ts_train, ic="aicc")
summary(fit_ets)
## ETS(M,N,A)
##
## Call:
## ets(y = ts_train, ic = "aicc")
##
## Smoothing parameters:
## alpha = 0.5563
## gamma = 1e-04
##
## Initial states:
## l = 21799.6093
## s = 973.9892 -241.2593 -693.6657 -510.1486 606.8107 1472.184
## -635.261 -1221.312 -1882.068 433.5543 -120.8753 1818.051
##
## sigma: 0.0318
##
## AIC AICc BIC
## 3021.333 3024.533 3068.013
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -28.62181 636.3201 487.9749 -0.2087385 2.342725 0.5920137
## ACF1
## Training set 0.03417389
The ETS model selected based on AICc is: ETS(M,N,A)
This configuration indicates:
Error: Multiplicative: the impact of random shocks scales with the level of the series
Trend: None: no long-term directional movement is modeled
Seasonality: Additive: seasonal fluctuations are constant in magnitude across time
This model structure reflects a stable seasonal pattern without an explicit trend, consistent with the behavior observed after differencing the original series.
checkresiduals(fit_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,A)
## Q* = 47.728, df = 24, p-value = 0.002727
##
## Model df: 0. Total lags used: 24
By applying the checkresiduals
function to the model, it
becomes evident that the residuals display significant autocorrelation,
as confirmed by the Ljung-Box test (p-value = 0.0027). The residual
diagnostics indicate that the ETS model does not adequately capture the
underlying dynamics of the series.
The ETS framework already encompasses the Holt-Winters family of models as special cases, so they will not be tested separately.
The next approach involves applying a Seasonal and Trend
decomposition using Loess (STL), followed by modeling the seasonally
adjusted series. This method, implemented via the stlm()
function, decomposes the original series into trend, seasonal, and
remainder components in a flexible, nonparametric way. After
decomposition, the remainder is modeled using ARIMA or ETS, allowing for
a hybrid structure that combines non-linear seasonal smoothing with
parametric short-term dynamics.
fit_stlm <- stlm(ts_train,
s.window = "periodic",
method = "arima")
summary(fit_stlm)
## Length Class Mode
## stl 664 mstl numeric
## model 19 forecast_ARIMA list
## modelfunction 1 -none- function
## lambda 0 -none- NULL
## x 166 ts numeric
## series 1 -none- character
## m 1 -none- numeric
## fitted 166 ts numeric
## residuals 166 ts numeric
# STL components (seasonal, trend, remainder)
plot(fit_stlm$stl)
# ARIMA fitted to the seasonally adjusted series (?)
summary(fit_stlm$model)
## Series: x
## ARIMA(4,1,1) with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 drift
## 0.4478 0.1155 0.0700 -0.0905 -0.9144 -19.3451
## s.e. 0.1080 0.0930 0.0922 0.0905 0.0814 9.5074
##
## sigma^2 = 385842: log likelihood = -1292.68
## AIC=2599.36 AICc=2600.08 BIC=2621.11
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -8.743192 607.9242 464.0256 -0.1211733 2.227514 0.5629583
## ACF1
## Training set -0.0006021931
checkresiduals(fit_stlm)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 29.546, df = 24, p-value = 0.2003
##
## Model df: 0. Total lags used: 24
The residual diagnostics from the model show a relatively well-behaved structure:
The residual time plot displays no obvious patterns or structural deviations.
The ACF of the residuals remains mostly within the confidence bounds, with no strong autocorrelation.
The histogram suggests approximate symmetry, though some deviation from normality is visible.
The Ljung-Box test confirms the absence of significant autocorrelation
cat("The residuals have mean", mean(residuals(fit_stlm)), "and variance", var(residuals(fit_stlm)))
## The residuals have mean -8.743192 and variance 371734.7
qqnorm(residuals(fit_stlm)); qqline(residuals(fit_stlm))
# Shapiro-Wilk test
shapiro.test(residuals(fit_stlm))
##
## Shapiro-Wilk normality test
##
## data: residuals(fit_stlm)
## W = 0.9794, p-value = 0.01422
# Kolmogorov–Smirnov test
ks.test(residuals(fit_stlm), "pnorm", mean=mean(residuals(fit_stlm)), sd=sd(residuals(fit_stlm)))
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: residuals(fit_stlm)
## D = 0.073808, p-value = 0.3263
## alternative hypothesis: two-sided
Normality of the residuals was evaluated using both the Shapiro-Wilk and Kolmogorov-Smirnov tests, along with visual inspection via the Q-Q plot.
\[ \begin{aligned} \text{H}_0\!:& \quad \text{Residuals are normally distributed} \\ \text{H}_1\!:& \quad \text{Residuals deviate from normality} \end{aligned} \]
p-value = 0.01422: ~Rej. H0 (depends on 1% or 5%): Some evidence of non-normality.
\[ \begin{aligned} \text{H}_0\!:& \quad \text{Residuals follow the normal distribution} \\ \text{H}_1\!:& \quad \text{Residuals do not follow the normal distribution} \end{aligned} \]
p-value = 0.3263: Fail to reject H0: no significant evidence of non-normality.
This difference arises because the Shapiro-Wilk test is more robust against non-normality, particularly for skewed distributions and in small samples. The Kolmogorov-Smirnov test, on the other hand, is more conservative and may fail to reject the null hypothesis for non-normal data, especially in small samples.
Shows minor tail deviations but an overall acceptable alignment with the theoretical normal distribution.
While the Shapiro-Wilk test suggests a slight deviation from normality, the K-S test and graphical analysis support the assumption of approximate normality. This level of deviation is not sufficient to disqualify the model for practical use.
To generate forecasts for future electricity consumption, the two best-performing models from the analysis will be used:
fit_arima # SARIMA(1,1,1)(1,1,1)[12]
fit_stlm # STLM(ARIMA)
Forecast accuracy and behavior will be compared visually and numerically using the held-out test set.
# forecast
arima_forecast <- forecast(fit_arima, h = length(ts_test))
# evaluation metrics
forecast::accuracy(arima_forecast, ts_test)
## ME RMSE MAE MPE MAPE MASE
## Training set 10.45888 617.0807 448.1913 -0.02916893 2.165709 0.5437481
## Test set -52.21608 562.8728 465.7346 -0.35018427 2.483365 0.5650317
## ACF1 Theil's U
## Training set -0.03951362 NA
## Test set 0.58261040 0.4207397
# plot the forecast (train+test vs forecast)
autoplot(ts_data, series = "Actual") +
autolayer(arima_forecast, series = "Forecast", PI = FALSE) +
labs(title = "Forecast vs Full Series - SARIMA(1,1,1)(1,1,1)[12]",
x = "Time (Monthly)",
y = "Electricity Consumption (GWh)") +
scale_colour_manual(values = c("Actual" = "black", "Forecast" = "blue"))
# plot the forecast (test vs forecast)
autoplot(ts_test, series = "Test") +
autolayer(arima_forecast, series = "Forecast", PI = FALSE) +
labs(title = "Forecast vs Test Set - SARIMA(1,1,1)(1,1,1)[12]",
x = "Time (Monthly)",
y = "Electricity Consumption (GWh)") +
scale_colour_manual(values = c("Test" = "black", "Forecast" = "blue"))
# forecast
stlm_forecast <- forecast(fit_stlm, h = length(ts_test))
# evaluation metrics
forecast::accuracy(stlm_forecast, ts_test)
## ME RMSE MAE MPE MAPE MASE
## Training set -8.743192 607.9242 464.0256 -0.1231423 2.230672 0.5629583
## Test set -26.131182 563.2452 460.1395 -0.2107488 2.451736 0.5582437
## ACF1 Theil's U
## Training set -0.0006021931 NA
## Test set 0.5836708163 0.4218294
# plot the forecast (train+test vs forecast)
autoplot(ts_data, series = "Actual") +
autolayer(stlm_forecast, series = "Forecast", PI = FALSE) +
labs(title = "Forecast vs Full Series - STLM(ARIMA)",
x = "Time (Monthly)",
y = "Electricity Consumption (GWh)") +
scale_colour_manual(values = c("Actual" = "black", "Forecast" = "blue"))
# plot the forecast (test vs forecast)
autoplot(ts_test, series = "Test") +
autolayer(stlm_forecast, series = "Forecast", PI = FALSE) +
labs(title = "Forecast vs Test Set - STLM(ARIMA)",
x = "Time (Monthly)",
y = "Electricity Consumption (GWh)") +
scale_colour_manual(values = c("Test" = "black", "Forecast" = "blue"))
Forecast accuracy was assessed using
forecast::accuracy()
, which compares fitted values on the
training set and forecast values against the test set. In both cases,
the training and test error metrics are very close, indicating no signs
of overfitting.
Visually, the predicted values follow the behavior of the actual test data closely, with aligned seasonal patterns and trend shifts.
Both models generalize well to unseen data and can be considered valid for forecasting purposes.
The forecast accuracy of the SARIMA and STLM models is evaluated using standard error metrics:
ME (Mean Error): Measures bias. A value close to 0 is desirable, but this metric can be misleading due to cancellation of positive and negative errors.
RMSE (Root Mean Squared Error): Sensitive to large deviations. Useful when large errors are particularly undesirable. Penalizes outliers more heavily.
MAE (Mean Absolute Error): Interpretable and robust. Less sensitive to outliers than RMSE. A solid general-purpose metric.
MPE (Mean Percentage Error): Can produce misleading values when actuals approach zero. May result in large positive or negative values.
MAPE (Mean Absolute Percentage Error): Intuitive and scale-independent, but unreliable when actual values are near zero.
MASE (Mean Absolute Scaled Error): Scale-free and comparable across datasets. Values < 1 indicate better performance than the naive forecast. Reliable and interpretable.
ACF1 (Lag-1 Autocorrelation of Residuals): Measures autocorrelation in residuals. Ideal values are near 0, indicating residuals are uncorrelated.
Theil’s U: Compares performance to a naive benchmark. Values < 1 indicate better-than-naive forecasts.
acc_arima <- forecast::accuracy(arima_forecast, ts_test)
acc_stlm <- forecast::accuracy(stlm_forecast, ts_test)
comparison <- rbind(
SARIMA = acc_arima["Test set", ],
STLM = acc_stlm["Test set", ]
)
round(comparison, 3)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## SARIMA -52.216 562.873 465.735 -0.350 2.483 0.565 0.583 0.421
## STLM -26.131 563.245 460.140 -0.211 2.452 0.558 0.584 0.422
The forecast accuracy metrics for SARIMA and STLM are very similar across all indicators, reflecting comparable overall performance.
The 95% prediction intervals (PI) represent the range within which future observations are expected to fall with 95% confidence.
# extract forecast time indices
time_vals <- time(arima_forecast$mean)
df_arima <- data.frame(
time = time_vals,
lo95 = arima_forecast$lower[,2],
hi95 = arima_forecast$upper[,2]
)
df_stlm <- data.frame(
time = time(stlm_forecast$mean),
lo95 = stlm_forecast$lower[,2],
hi95 = stlm_forecast$upper[,2]
)
# plot the models 95% PI bands
ggplot() +
autolayer(ts_test, series = "Actual") +
geom_ribbon(data = df_arima, aes(x = time, ymin = lo95, ymax = hi95, fill = "SARIMA 95% PI"), alpha = 0.5) +
geom_ribbon(data = df_stlm, aes(x = time, ymin = lo95, ymax = hi95, fill = "STLM 95% PI"), alpha = 0.5) +
autolayer(ts_test, series = "Actual") +
labs(title = "Forecast Comparison: SARIMA vs STLM",
x = "Time (Monthly)",
y = "Electricity Consumption (GWh)") +
scale_fill_manual(values = c("SARIMA 95% PI" = "#0072B2", "STLM 95% PI" = "#D55E00")) +
scale_color_manual(values = c("Actual" = "black")) +
theme_minimal()
Visually, both the SARIMA and STLM forecasts maintain intervals that consistently cover the actual values in the test set, indicating calibrated uncertainty estimates. The width of the intervals remains stable over time.
# accuracy of 95% PI
inside_95_arima <- ts_test >= arima_forecast$lower[,2] & ts_test <= arima_forecast$upper[,2]
inside_95_stlm <- ts_test >= stlm_forecast$lower[,2] & ts_test <= stlm_forecast$upper[,2]
cat("The SARIMA has", mean(inside_95_arima)*100,"% of the real values inside its 95% PI.","\n",
"The STLM has", mean(inside_95_stlm)*100,"% of the real values inside its 95% PI")
## The SARIMA has 100 % of the real values inside its 95% PI.
## The STLM has 100 % of the real values inside its 95% PI
The test observations fall within the 95% intervals for all time points, suggesting that both models produce reliable forecasts.
# avg width of 95% PI
width_95_arima <- arima_forecast$upper[,2] - arima_forecast$lower[,2]
width_95_stlm <- stlm_forecast$upper[,2] - stlm_forecast$lower[,2]
cat("The SARIMA 95% PI has an avg width of", mean(width_95_arima),"\n",
"The STLM 95% PI has an avg width of", mean(width_95_stlm))
## The SARIMA 95% PI has an avg width of 4378.183
## The STLM 95% PI has an avg width of 3639.196
The STLM model produces narrower 95% PI on average compared to SARIMA. This means that the STLM model yields forecasts with lower uncertainty and higher precision. Narrower PIs indicate that the model is more confident in its predictions, which is desirable when coverage is not sacrificed.
The chosen model is the STLM with ARIMA applied to the seasonally adjusted component, justified by the following points:
Lower autocorrelation in residuals during model fitting on the training set
Stronger indications that residuals follow a normal distribution
Reduced variance in residuals compared to SARIMA(1,1,1)(1,1,1)[12]
Practically identical forecast accuracy to SARIMA(1,1,1)(1,1,1)[12], with no meaningful difference in performance
Narrower 95% prediction intervals on average, indicating more confident forecasts
These factors support the selection of STLM(ARIMA) as the final model for forecasting monthly electricity consumption.