Appendices

Source Code and Data Repository

The project is structured around two main directories:

data
Contains the datasets used in the tutorial, as well as the consolidated SQLite database.
scripts
Contains the various Python scripts presented throughout the tutorial sections.

Project Overview and Structure

[project_root]
├── data
│   ├── thermal.db
│   │
│   ├── RTE
│   │   └── rte_daily_consumption_2012_2024.csv
│   │
│   └── other_vars
│       ├── 2m_temperature_daily_mean_2012.nc
│       ├── ...
│       ├── 2m_temperature_daily_mean_2024.nc
│       ├── create_db.py
│       └── ddl.txt
│
└── scripts
    ├── AR.py
    ├── AR_with_calendar.py
    ├── persistence.py
    ├── rte_with_preds.py
    └── with_meteo.py

Data Access

The data directory—which includes the weather data, electricity consumption time series, and the SQLite database used in this tutorial—can be downloaded from the Zenodo Repository.

Script Access

The Python scripts associated with this tutorial are available in the GitHub Repository.

Running the Scripts

From the project root directory, you can run the scripts using the following command:

python scripts/[script_name].py

Example:

python scripts/AR.py

SQLite Database DDL

--
-- File generated with SQLiteStudio v3.4.17 on mar. mai 12 14:36:21 2026
--
-- Text encoding used: System
--
PRAGMA foreign_keys = off;
BEGIN TRANSACTION;

-- Table: feature_series
CREATE TABLE IF NOT EXISTS feature_series (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    site_id INTEGER NOT NULL,
    var_name TEXT NOT NULL,
    var_description TEXT,
    unit TEXT,
    short_var_name TEXT,
    series BLOB NOT NULL,
    start_date TEXT NOT NULL,
    end_date TEXT NOT NULL,
    total_values INTEGER NOT NULL,
    original_nan_count INTEGER NOT NULL,
    nan_percentage REAL NOT NULL,
    max_consecutive_nans INTEGER NOT NULL,
    nans_interpolated INTEGER NOT NULL,
    interpolation_method TEXT,
    data_quality TEXT NOT NULL,
    quality_score REAL NOT NULL,
    import_timestamp TEXT NOT NULL,
    origin TEXT,
    FOREIGN KEY (site_id) REFERENCES sites(site_id),
    UNIQUE(site_id, var_name)
);

-- Table: import_metadata
CREATE TABLE IF NOT EXISTS import_metadata (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    import_date TEXT NOT NULL,
    period_start TEXT NOT NULL,
    period_end TEXT NOT NULL,
    max_nan_threshold REAL NOT NULL,
    total_files_processed INTEGER NOT NULL,
    total_series_imported INTEGER NOT NULL,
    processing_time_seconds REAL NOT NULL
);

-- Table: sites
CREATE TABLE IF NOT EXISTS sites (
    site_id INTEGER PRIMARY KEY AUTOINCREMENT,
    latitude REAL NOT NULL,
    longitude REAL NOT NULL,
    UNIQUE(latitude, longitude)
);

-- Index: idx_sites_coords
CREATE INDEX IF NOT EXISTS idx_sites_coords
ON sites(latitude, longitude);

-- Index: var_name_idx
CREATE INDEX IF NOT EXISTS var_name_idx
ON feature_series(var_name);

COMMIT TRANSACTION;
PRAGMA foreign_keys = on;

Transitioning from Tutorial to Production

We cannot emphasize this enough: the model presented in this tutorial remains an offline prototype, trained and evaluated using perfect ERA5 reanalysis data. To utilize it in operational conditions (Daily D+1 forecasting), several concrete adaptations are required:

Replace ERA5 temperatures with actual D+1 weather forecasts (from Météo-France, ECMWF, or a data provider). This will inevitably result in a slight degradation in performance, varying according to the quality of that day's weather forecast. This degradation could potentially be offset by using a larger number of meteorological variables (wind speed, wind chill, solar radiation, cloud cover, \(T_{max}\), \(T_{min}\), etc.). Using several of these variables, potentially across different geographical sites, often improves robustness.
Implement regular model recalibration (every 1 to 3 months) using the most recent data to account for gradual shifts in consumption behavior.
Establish continuous performance monitoring (rolling MAE, residual analysis by day type: heatwaves, cold snaps, holiday bridges, etc.) to quickly detect any degradation.
Automate the entire pipeline: including RTE and weather data retrieval, feature engineering, inference, and forecast distribution.

This modular architecture—a stable base model supplemented by a residual corrector—offers the advantage of being relatively easy to maintain and evolve over time.