Comparison with RTE Forecasts
Voici la traduction en anglais américain de cette section, rédigée dans un style fluide, précis et rigoureux pour de la Data Science :
A question naturally arises at this stage: how does our model compare against the operational forecasts published by RTE itself?
RTE obviously possesses resources far superior to those deployed in this tutorial: specialized teams, much higher-granularity real-time data, operational weather forecasts updated multiple times a day, and extensive additional information regarding power system operations (fleet maintenance, outages, cross-border exchanges, etc.).
To perform a comparison, we rely on the Prévision J-1 ("Day-Ahead Forecast") column found in the final annual Eco2mix files. This column is intended to represent the forecast generated the day before for the following day.
However, utilizing this column requires some caution. The Eco2mix files used here are "final" datasets published ex-post. Consequently, it is possible that some of the forecasts were recalculated or corrected after the actual observations became known, which introduces a slight risk of data leakage.
Indeed, we noticed that for certain older years (notably 2019–2021), the MAE of this proxy forecast appears abnormally low, strongly suggesting ex-post revisions to the values.
Conversely, over our 2022–2024 test period, the observed performance seems much more realistic. During this timeframe, the MAE of the RTE day-ahead forecast stands around 25,900 MWh. Therefore, we focus our comparison specifically on these years.
Over this period, our final model achieves an MAE of approximately 24,307 MWh, compared to around 25,900 MWh for the RTE proxy forecast.
This result must, however, be interpreted with caution.
The meteorological variables used in this tutorial come from ERA5, which is a weather reanalysis dataset. Unlike a true operational weather forecast, ERA5 is produced ex-post by assimilating real observations into an atmospheric model. The data used here are therefore much closer to a "perfect reconstructed weather" scenario than to an actual forecast available at the moment consumption needs to be anticipated.
In other words, in a real-world day-ahead forecasting operational context, it would not be possible to directly use the ERA5 values for the day being predicted. They would have to be replaced by weather forecasts from operational models (ECMWF, Météo-France, etc.), which are inherently noisier and imperfect.
Thus, the comparison presented here should not be interpreted as a demonstration of superiority over RTE's industrial models. Rather, it shows that a relatively straightforward methodology, relying solely on public data and interpretable linear models, can already achieve a competitive level of performance.
This is precisely the objective of this tutorial: to demonstrate that it is possible to build a credible, robust, and non-trivial energy forecasting system using accessible tools and a rigorous methodology.