Introducing calendar structure
Why history alone is not enough
Despite the improvement it brings, the autoregressive model has an obvious weakness: it completely ignores calendar structure.
It is well known that electricity consumption is strongly linked to human behavior. Weekdays do not resemble weekends, and public holidays exhibit very specific profiles. A model that ignores these effects cannot correctly capture certain variations.
To be convinced of this, one only needs to observe the series: Monday’s consumption is often closer to that of previous Mondays than to that of the preceding Sunday.
Autoregression captures the inertia of the series, yet it still ignores a trivial piece of information: the date itself. This suggests that the model must be enriched with variables explicitly describing the calendar.
Encoding Calendar Variables
We introduce several simple yet highly informative calendar variables. These include the day of the week, represented through one-hot encoding, together with binary indicators identifying weekends, public holidays, and bridge days.
Note
One-Hot Encoding (or "dummy encoding") consists of transforming a categorical variable (such as the day of the week) into several binary columns (0 or 1).
We create as many columns as there are categories. For each row, only one of these columns will have the value 1 (the one corresponding to the specific day), while all others will be 0. Instead of a single day_of_week column ranging from 0 to 6—which would create an absurd hierarchy between days—we create 7 new columns:
| Day | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|---|---|---|---|---|---|---|---|
| Monday | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| Tuesday | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Wednesday | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| . . . | |||||||
| Sunday | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Unlike autoregressive variables, this information is known in advance for the target day. Therefore, it is not necessary to apply a rolling time window to them. We then construct an enriched representation of each observation:
X = [history of the last N days] + [target day calendar variables]
A typical implementation using Pandas and NumPy looks like this:
def create_calendar_features(dates):
dates = pd.to_datetime(dates)
dow = dates.weekday
dow_oh = np.eye(7)[dow]
is_weekend = (dow >= 5).astype(int)
return np.column_stack([dow_oh, is_weekend])
With the purely autoregressive model, the gradual improvement up to about 30 days suggested that part of the series' structure extended beyond a simple weekly cycle. Adding calendar variables significantly changes this behavior: starting with a 7-day window, performance becomes remarkably stable, and increasing the amount of historical data brings only marginal gains. Between 7 and 120 days, the MAE varies by less than 2%.
One plausible explanation is that these variables make certain regular patterns explicit (business days, weekends, public holidays, long weekends) that the autoregressive model previously had to reconstruct implicitly from a longer history. As a result, a relatively short window becomes sufficient.
This improvement is particularly interesting because it does not rely on complex external data, but simply on a better representation of the problem. It confirms that electricity consumption is strongly influenced by calendar effects.
In the remainder of this tutorial, we will use a 14-day window for autoregression, as it provides the best performance here while keeping the model compact.
At this stage, the model correctly captures both past trends, thanks to its autoregressive component, and the weekly structure, through the calendar variables.
Script implementing calendar variables: scripts/AR_with_calendar.py
Script output:
=== PERSISTENCE BASELINE ===
MAE persistence: 56,428.78 MWh
=== AR + CALENDAR MODEL ===
Window: 7 days
Model MAE: 26,647.41 MWh
Δ vs persistence: -29,781.37
Window: 14 days
Model MAE: 26,303.62 MWh
Δ vs persistence: -30,125.16
Window: 30 days
Model MAE: 26,648.88 MWh
Δ vs persistence: -29,779.89
Window: 60 days
Model MAE: 26,750.22 MWh
Δ vs persistence: -29,678.56
Window: 90 days
Model MAE: 26,626.09 MWh
Δ vs persistence: -29,802.69
Window: 120 days
Model MAE: 26,530.34 MWh
Δ vs persistence: -29,898.44