Predicting Urban Reservoir Levels Using Statistical Learning Techniques
Currently, about 50% of the world population lives in cities, and the World Bank has projected that by 2050, this number will grow to 65%. When paired with a changing hydrological environment, including an increased likelihood of droughts, rapid urban growth puts cities and their watersheds in a vulnerable position.
An interesting paper that employs supervised learning techniques to predict reservoir levels.
The main focus for this study was Atlanta, Georgia, although Indianapolis, Indiana and Austin, Texas were also included in the analysis. The authors investigate the predictive power using a number of models:
- Generalized linear model (GLM)
- Generalized additive model (GAM)
- Multivariate adaptive regression splines (MARS)
- Classification and regression tree (CART)
- Bagged CART
- Random forest, Support vector machine (SVM)
- Bayesian additive regression tree (BART)
and attempt to understand which predictors (hydrological system inputs and outputs) – precipitation, streamflow, population, dew point temperature, humidity, water use, soil moisture, contribute the most to the predictive accuracy. Not surprisingly the importance of each predictor varies with system. Population and the ENSO index appear to have the largest relative effect. Interestingly local rainfall (precipitation) was the least important variable.
The data and supplemental notes on the methodology are available on the Nature website. At some point I'll go back and look at this in more detail. It would be interesting to see if this could be applied to Cape Town.
The most important variables were the streamflow (into the reservoir), dew point temperature, and population, followed by soil moisture and the El Niño/Southern Oscillation (ENSO) index. Conversely, precipitation was the least important variable when trying to predict reservoir level.