Appendices
Source Code and Data Repository
The source code is available in the following GitHub repository: [Link].
Question: Since the thermal.db database is 1.4 GB, should it also be hosted on GitHub, or should it be published elsewhere (e.g., Zenodo or another platform)?
SQLite Database DDL
--
-- File generated with SQLiteStudio v3.4.17 on mar. mai 12 14:36:21 2026
--
-- Text encoding used: System
--
PRAGMA foreign_keys = off;
BEGIN TRANSACTION;
-- Table: feature_series
CREATE TABLE IF NOT EXISTS feature_series (
id INTEGER PRIMARY KEY AUTOINCREMENT,
site_id INTEGER NOT NULL,
var_name TEXT NOT NULL,
var_description TEXT,
unit TEXT,
short_var_name TEXT,
series BLOB NOT NULL,
start_date TEXT NOT NULL,
end_date TEXT NOT NULL,
total_values INTEGER NOT NULL,
original_nan_count INTEGER NOT NULL,
nan_percentage REAL NOT NULL,
max_consecutive_nans INTEGER NOT NULL,
nans_interpolated INTEGER NOT NULL,
interpolation_method TEXT,
data_quality TEXT NOT NULL,
quality_score REAL NOT NULL,
import_timestamp TEXT NOT NULL,
origin TEXT,
FOREIGN KEY (site_id) REFERENCES sites(site_id),
UNIQUE(site_id, var_name)
);
-- Table: import_metadata
CREATE TABLE IF NOT EXISTS import_metadata (
id INTEGER PRIMARY KEY AUTOINCREMENT,
import_date TEXT NOT NULL,
period_start TEXT NOT NULL,
period_end TEXT NOT NULL,
max_nan_threshold REAL NOT NULL,
total_files_processed INTEGER NOT NULL,
total_series_imported INTEGER NOT NULL,
processing_time_seconds REAL NOT NULL
);
-- Table: sites
CREATE TABLE IF NOT EXISTS sites (
site_id INTEGER PRIMARY KEY AUTOINCREMENT,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
UNIQUE(latitude, longitude)
);
-- Index: idx_sites_coords
CREATE INDEX IF NOT EXISTS idx_sites_coords
ON sites(latitude, longitude);
-- Index: var_name_idx
CREATE INDEX IF NOT EXISTS var_name_idx
ON feature_series(var_name);
COMMIT TRANSACTION;
PRAGMA foreign_keys = on;
Transitioning from Tutorial to Production
We cannot emphasize this enough: the model presented in this tutorial remains an offline prototype, trained and evaluated using perfect ERA5 reanalysis data. To utilize it in operational conditions (Daily D+1 forecasting), several concrete adaptations are required:
-
Replace ERA5 temperatures with actual D+1 weather forecasts (from Météo-France, ECMWF, or a data provider). This will inevitably result in a slight degradation in performance, varying according to the quality of that day's weather forecast. This degradation could potentially be offset by using a larger number of meteorological variables (wind speed, wind chill, solar radiation, cloud cover, \(T_{max}\), \(T_{min}\), etc.). Using several of these variables, potentially across different geographical sites, often improves robustness.
-
Implement regular model recalibration (every 1 to 3 months) using the most recent data to account for gradual shifts in consumption behavior.
-
Establish continuous performance monitoring (rolling MAE, residual analysis by day type: heatwaves, cold snaps, holiday bridges, etc.) to quickly detect any degradation.
-
Automate the entire pipeline: including RTE and weather data retrieval, feature engineering, inference, and forecast distribution.
This modular architecture—a stable base model supplemented by a residual corrector—offers the advantage of being relatively easy to maintain and evolve over time.