Skip to content

Introducing calendar structure

Why history alone is not enough

Despite the improvement it brings, the autoregressive model has an obvious weakness: it completely ignores calendar structure.

It is well known that electricity consumption is strongly linked to human behavior. Weekdays do not resemble weekends, and public holidays exhibit very specific profiles. A model that ignores these effects cannot correctly capture certain variations.

To be convinced of this, one only needs to observe the series: Monday’s consumption is often closer to that of previous Mondays than to that of the preceding Sunday.

Autoregression captures the inertia of the series, yet it still ignores a trivial piece of information: the date itself. This suggests that the model must be enriched with variables explicitly describing the calendar.

Encoding Calendar Variables

We introduce a set of simple yet highly informative variables:

Day of the week (one-hot encoded),

Weekend indicator (binary 0 or 1),

Public holiday indicator (binary 0 or 1),

"Bridge day" indicator (binary 0 or 1).

Note: One-Hot Encoding

One-Hot Encoding (or "dummy encoding") consists of transforming a categorical variable (such as the day of the week) into several binary columns (0 or 1).

We create as many columns as there are categories. For each row, only one of these columns will have the value 1 (the one corresponding to the specific day), while all others will be 0. Instead of a single day_of_week column ranging from 0 to 6—which would create an absurd hierarchy between days—we create 7 new columns:

Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Monday 1 0 0 0 0 0 0
Tuesday 0 1 0 0 0 0 0
Wednesday 0 0 1 0 0 0 0
. . .
Sunday 0 0 0 0 0 0 1

Unlike autoregressive variables, this information is known in advance for the target day. Therefore, it is not necessary to apply a rolling time window to them. We then construct an enriched representation of each observation:

X = [history of the last N days] + [target day calendar variables]

A typical implementation using Pandas and NumPy looks like this:

def create_calendar_features(dates):    
    dates = pd.to_datetime(dates)
    dow = dates.weekday
    dow_oh = np.eye(7)[dow]
    is_weekend = (dow >= 5).astype(int)
    return np.column_stack([dow_oh, is_weekend])

When these variables are added to the autoregressive model, the gain is immediate: the MAE drops to approximately 26,184 MWh, representing a gain of nearly 10% compared to the best performance achieved with AR alone (see output below).

With the purely autoregressive model, the gradual improvement up to about 30 days suggested that part of the series' structure extended beyond a simple weekly cycle. Adding calendar variables significantly changes this behavior: starting with a 7-day window, performance becomes remarkably stable, and increasing the amount of historical data brings only marginal gains. Between 7 and 120 days, the MAE varies by less than 2%.

One plausible explanation is that these variables make certain regular patterns explicit (business days, weekends, public holidays, long weekends) that the autoregressive model previously had to reconstruct implicitly from a longer history. As a result, a relatively short window becomes sufficient.

This improvement is particularly interesting because it does not rely on complex external data, but simply on a better representation of the problem. It confirms that electricity consumption is strongly influenced by calendar effects.

In the remainder of this tutorial, we will use a 14-day window for autoregression, as it provides the best performance here while keeping the model compact.

At this stage, our model correctly captures:

  • Past trends (via AR),

  • Weekly structure (via the calendar).

Script implementing calendar variables: scripts/AR_with_calendar.py

Script output:

=== PERSISTENCE BASELINE ===
MAE persistence: 56,428.78 MWh

=== AR + CALENDAR MODEL ===

Window: 7 days
Model MAE: 26,647.41 MWh
Δ vs persistence: -29,781.37

Window: 14 days
Model MAE: 26,303.62 MWh
Δ vs persistence: -30,125.16

Window: 30 days
Model MAE: 26,648.88 MWh
Δ vs persistence: -29,779.89

Window: 60 days
Model MAE: 26,750.22 MWh
Δ vs persistence: -29,678.56

Window: 90 days
Model MAE: 26,626.09 MWh
Δ vs persistence: -29,802.69

Window: 120 days
Model MAE: 26,530.34 MWh
Δ vs persistence: -29,898.44

At this point, our model already captures two essential dimensions: the system's memory and its calendar rhythm. However, a significant portion of the variability remains unexplained. To understand the root of these remaining discrepancies, we must now look beyond the consumption data itself and introduce a major physical factor: the weather.