General Conclusion
Throughout this tutorial, we progressively built a complete one-day-ahead (D+1) forecasting pipeline for the daily discharge of the Mississippi River at St. Louis.
The first step consisted of identifying a reference station and constructing a regular, usable hydrological time series. This data preparation phase, often underestimated, proved essential for ensuring the reliability of the subsequent experiments.
We then investigated a first univariate forecasting problem using only the local history of streamflow. This approach already demonstrated that a simple sliding time window combined with linear regression can deliver remarkably competitive results, outperforming several more sophisticated models.
In a second stage, we enriched the model by incorporating twelve additional hydrological stations distributed along the Mississippi and Missouri Rivers. The results confirm that upstream observations contain valuable predictive information: they make it possible to anticipate the propagation of the hydrological signal toward St. Louis and significantly improve forecasting performance.
Finally, we incorporated meteorological information derived from Copernicus ERA5. Rather than using several thousand grid points directly, we constructed a simplified spatial representation based on a small number of aggregated geographical regions. This approach provides an effective compromise between the richness of the information and the complexity of the model. The experiments show that a spatial representation of precipitation further improves forecasting performance, whereas a single average over the entire basin yields only a much more limited gain.
Beyond the numerical results, several important lessons emerge.
The first is that, in time series forecasting, the quality of data preparation and feature engineering is often more important than the choice of a particularly sophisticated algorithm.
The second is that progressively enriching the set of input features leads to steady improvements in performance: local historical observations, upstream hydrological stations, and meteorological data each contribute complementary predictive information.
Finally, this work illustrates a general methodology that can be applied to many forecasting problems: begin with a simple model, establish a strong baseline, and then progressively incorporate additional sources of information while objectively evaluating their contribution.
The pipeline presented here should therefore be regarded as a foundation rather than a final solution. Many avenues for further investigation remain open. A more detailed study could, for example, analyze performance by season, distinguish between low-flow and flood periods, evaluate the models' ability to anticipate extreme events, compare different forecasting horizons, or incorporate more sophisticated learning architectures. These extensions, however, lie beyond the scope of this tutorial, whose primary objective is to present a reproducible and progressive methodology.
Author: Eric Duhamel
Contact: [email protected]