Comparison with RTE Forecasts

Voici la traduction en anglais américain de cette section, rédigée dans un style fluide, précis et rigoureux pour de la Data Science :

A question naturally arises at this stage: how does our model compare against the operational forecasts published by RTE itself?

RTE obviously possesses resources far superior to those deployed in this tutorial: specialized teams, much higher-granularity real-time data, operational weather forecasts updated multiple times a day, and extensive additional information regarding power system operations (fleet maintenance, outages, cross-border exchanges, etc.).

To perform a comparison, we rely on the Prévision J-1 ("Day-Ahead Forecast") column found in the final annual Eco2mix files. This column is intended to represent the forecast generated the day before for the following day.

However, utilizing this column requires some caution. The Eco2mix files used here are "final" datasets published ex-post. Consequently, it is possible that some of the forecasts were recalculated or corrected after the actual observations became known, which introduces a slight risk of data leakage.

Indeed, we noticed that for certain older years (notably 2019–2021), the MAE of this proxy forecast appears abnormally low, strongly suggesting ex-post revisions to the values.

Conversely, over our 2022–2024 test period, the observed performance seems much more realistic. During this timeframe, the MAE of the RTE day-ahead forecast stands around 25,900 MWh. Therefore, we focus our comparison specifically on these years.

Over this period, our final model achieves an MAE of approximately 24,307 MWh, compared to around 25,900 MWh for the RTE proxy forecast.

This result must, however, be interpreted with certain cautions.

The meteorological variables used in this tutorial come from ERA5, an atmospheric reanalysis produced ex-post by assimilating real-world observations into a global weather model. ERA5 data is generally more homogeneous, more complete, and less noisy than the weather data available in a real operational context.

It is important to note, however, that our model solely uses past weather observations between day D−14 and day D−1 to forecast consumption on day D. Therefore, data corresponding to D−14 through D−2 poses no particular difficulty in a real operational setting: when generating a forecast for day D, these observations are already known and generally available.

The situation is slightly more sensitive for the day D−1 variables. Depending on the exact time the forecast is generated, certain observations or daily aggregates for D−1 may not yet be fully available. In a real operational system, the most recent weather information would then have to be replaced by partial real-time observations or by weather forecasts from operational numerical models (ECMWF, Météo-France, etc.), which are inherently noisier and more uncertain than the ERA5 data used here.

Finally, the ERA5 fields used in this tutorial offer dense, regular, and homogeneous spatial coverage across the entire territory. In a real-world system based on operational observations, the available measurements would typically come from a more irregular, partially incomplete, and potentially noisy network of weather stations. The spatial aggregates used here (national averages, minimums, or maximums) would consequently be more sensitive to spatial sampling issues and measurement uncertainties.

Thus, the comparison presented here should not be interpreted as a demonstration of superiority over RTE's industrial models. Rather, it shows that a relatively straightforward methodology, relying solely on public data and interpretable linear models, can already achieve a competitive level of performance.

This is precisely the objective of this tutorial: to demonstrate that it is possible to build a credible, robust, and non-trivial energy forecasting system using accessible tools and a rigorous methodology.