Differentiable HBV with Muskingum-Cunge routing (full version name δHBV2δMC2-Globe2-hydroDL, referred to as δHBV2 for brevity) is a hybrid, multiscale model that can learn from thousands of sites and output hydrologic fluxes and states at high spatial resolution. As it was only recently introduced in Song et al.50, it has not yet been used to generate global hydrologic insights, nor has it been compared with GWMs. In short, a neural network generates physical parameters for a differentiable implementation of the conceptual hydrological model Hydrologiska Byråns Vattenbalansavdelning (HBV)51. Rainfall-runoff processes are simulated with HBV's equations at small unit basins (MERIT network, median catchment size ~37 km2) and then routed downstream by a differentiable Muskingum-Cunge (MC) model (Methods). A total of 4746 basins with catchment areas less than 50,000 km2 are used for model training and testing. Evaluations are carried out on (1) 33 large global rivers with mixed anthropogenic influences (called mixed-anthropogenic-impact rivers or mixed-impact rivers for short) previously used in intercomparisons52 of six models (GWM0-GWM5) in ISIMIP2a53; (2) 28 other large rivers with less human impacts and a relatively minimal catchment area of 500,000 km2 (natural rivers); and (3) >5000 smaller basins with relatively continuous streamflow records.
In the following, we examine how basic hydrologic partitioning has changed between 2001 and 2020 and give practical examples of the implications for a number of US and European estuaries. To support the analysis in each case, we provide a benchmark against established GWMs in terms of matching various aspects of observations on large global rivers as well as smaller rivers. Using simulation benchmarks from both large and small global rivers, we explain how the differentiable model can better capture changes that are challenging for established GWMs.
Our assessment of terrestrial water partitioning relies on high-resolution simulations that can accurately capture water balances and their change trends. δHBV2 offers minimal bias for the long-term mean annual runoff (MAR) -- it has a mean absolute bias of 32.4 mm/yr for large natural rivers and 42.0 mm/yr for large mixed-impact rivers (Fig. 1a, b, detailed values in Supplementary Tables S1 and S2). These values are more than 20% lower than the biases of GWM4 (62.4 and 52.2 mm/yr for natural and mixed-impact rivers, respectively) and 55% lower than those of GWM5 (75.4 and 93.1 mm/yr). δHBV2 has only 3 out of the 61 major-river basins (Congo, Xingu, Paraguai) with absolute biases over 100 mm/yr, which is much fewer than GWM4 (13), GWM5 (17), and GWM1 (42). δHBV2's high spatial R values between MAR simulations and observations for mixed-impact (0.92) and natural (0.97) basins means most of the spatial variability in large-river MAR is explained by the model (Fig. 1d, e). The interannual variability in streamflow is also captured well by δHBV2, which achieves the lowest median annual-scale root-mean-square error (RMSE) values of 34.5 mm/yr and 39.8 mm/yr for mixed-impact and natural basins -- around 40% lower than GWM4 (Fig. 1a, b). Importantly, δHBV2 is also the only model that achieved R > 0.4 in describing the spatial variability of the temporal trends of the streamflow-to-precipitation ratio (Q/P, Fig. 1c, with detailed performance information in Supplementary Fig. S2) for the natural rivers. The most challenging trends are those in Africa, e.g., the Niger (GRDC 1834101) and Congo (GRDC 1147010) rivers, where the model could not capture the rising trend in Q/P due to a paucity of training data. Established GWMs tend to have R values lower than 0.13 and exhibit large scattering in the estimated Q/P trends around the observed value, since such spatial heterogeneity in response patterns is challenging to grasp. Because these 61 large river stations were not used in training the model, δHBV2's high performance is due to the collective knowledge gained from the numerous small basins used for big-data training.
With refreshingly high accuracy and resolution, δHBV2 reveals significant trends in annual green-blue-water partitioning for many regions over the last 20 years (trends in Fig. 1d; long-term averages in Supplementary Fig. S3). Here we demonstrate the partitioning using evapotranspiration-to-precipitation ratio (ET/P) with local runoff-to-precipitation ratio Q'/P provided in Supplementary Fig. S4. It should be noted that local Q'/P and ET/P may not sum to one each year due to storage effects. Blue water returns to the ocean in rivers, while green water is tied to plant water use and carbon/energy/nutrient cycles and exits as ET. Thus, ET/P reflects the most fundamental partitioning of the terrestrial water cycle. In general, midlatitudes in North America and Asia and tropical areas like Central America and Papua New Guinea have seen decreasing green water fluxes while Central Europe and subtropical and midlatitude regions in South America have seen them increasing. Some of these shifts are substantial -- with a 1% change in this ratio per year, some regions have thus shifted 20% over the course of 20 years. These changes are correlated with trends in precipitation (Supplementary Fig. S5): where precipitation increases, blue water tends to increase, and vice versa, although the patterns do not fully match. This suggests that large-scale climate shifts affect water partitioning, and increasing rainfall can overflow storage thresholds to increase blue water.
The shifts in local baseflow-to-runoff ratio (baseflow/Q'; trends in Fig. 1e, long-term averages in Supplementary Fig. S6, basin-scale data in Supplementary Fig. S7) have overlaps with the blue-green-water shifts, but are significantly more widespread. Due to the important role of groundwater discussed previously, these shifts imply pervasive changes in stream temperature and water quality characteristics at the decadal scale. Thus, the baseflow ratio should not be treated as static, which is currently the standard practice. This ratio also exhibits regional clustering that has not been noted before -- basins within a large region tend to have similar shifts, presumably reflecting decadal-scale trends in the regional climate. The two ratios move correspondingly because both reflect increases in runoff and decreases in infiltrated or land-retained water. The processes of groundwater recharge and ET then compete for infiltrated water, as water exceeding the soil's water-holding capacity moves to replenish deeper moisture and groundwater, which later becomes baseflow. ET/P changes are more muted than those of baseflow/Q', e.g., regions like India show noticeably rising baseflow/Q' but little change in ET/P. It can be explained that changes in precipitation and infiltration leave large imprints on recharge and thus baseflow, while the magnitude of ET responses may be limited by the soil's ability to hold water. On a side note, the baseflow ratios in Supplementary Fig. S6 are noticeably higher than those from gage-based separation methods due to conceptual and scale differences, and our simulated baseflow behavior is very similar to the observations if the same analysis method is applied (Supplementary Text S1 and Supplementary Fig. S7).
These shifts contribute to water excess and scarcity. Where blue-water fraction increases and baseflow ratio decreases in tandem (Fig. 1d, e, also see Supplementary Fig. S8), e.g., northeastern China, mid-latitude North America, and Papua New Guinea, there is a higher flooding potential, which has been documented in some studies. Where model-diagnosed MERIT-basin-scale ET/P increases substantially, it often results from precipitation declines. In fact, we find a negative correlation between changes in ET/P and precipitation in space and time (Supplementary Fig. S8). When annual precipitation declines, there is less excess water that can overcome the storage thresholds to become runoff, so more precipitated water exits as green-water fluxes, and blue water drops disproportionately. The prominent shifts toward green-water fluxes which occur in Germany, central Siberia, southern Brazil, central Chile, the Congo basin, and northern Australia are due to declining precipitation as discussed above (Supplementary Fig. S5) and suggest a disproportionate decrease in streamflow available for human use in these regions. These depicted shifts are mainly climate-driven, because while land use and human water use could have contributed to the shifts in water balance, these two processes are not explicitly simulated by the current version of the model (see "Limitations"). Because the baseflow process and hydrologic processes are scale-dependent, these fine-grained insights about baseflow and runoff need to be obtained from a data-trained high-resolution model like this one.
As a direct consequence of the shifts, we witness statistically significant trends in freshwater inputs to estuaries over the last two decades, with significant implications for ecosystems. The data-trained δHBV2 model identifies 10 stations flowing into estuaries out of the 55 analyzed as having significant declining trends in mean annual freshwater inflows from 2001 to 2020, mostly along the North Sea coast of Germany and France (Fig. 2b), which overlap with the increasing ET/P and baseflow/Q' ratios. The declining trends are substantial -- a few German sites decline more than 1.5% per year, amounting to a 30% decline in 20 years. The changes in ET/P ratio (Fig. 1d) in Europe contributed to this decline. In contrast, US Mid-Atlantic estuaries have an increasing trend from 2001 to 2020. The trends for some of the mildly increasing/decreasing stations in the USA and France are not statistically significant (shown with thin marker borders in Fig. 2b), but the regional clustering of such trends is clear. Freshwater declines can increase salinity and turbidity, which can drastically alter macrofaunal communities including fish as well as invertebrates like crustaceans, insects, and bivalves. Combined with global sea level rise, large changes in estuarine salinity and macrofauna are expected. Consistent with this expectation, it was reported that macrofauna abundance in German estuaries decreased by around 31% and the total biomass decreased by around 45% for approximately this same period. In that study, the changes were attributed entirely to sea level rise, but the decreasing freshwater inputs revealed here could have also played an important yet unacknowledged role.
Perhaps partly attributable to model limitations, the above-mentioned declines in freshwater inputs to European estuaries (Fig. 2c) have not previously been reported. δHBV2 describes noticeably different trends than established GWMs for these estuaries, as shown at sites where a comparison is possible (Fig. 2a). We found a stronger match of δHBV2 predictions with observed trends (reaching R values around 0.68) compared to the GWMs (GWM0: -0.05, GWM4: 0.46, with other models ranging between them). GWM4, for example, overestimated the freshwater declines of European sites in 2001-2010 while not showing any trend for the US sites. Freshwater inputs show clear decadal-scale oscillations, as US Atlantic estuaries have notable declines from 2001 to 2010 and overall rising trends from 2001 to 2020 (Fig. 2a, top row), but GWM0 largely misses the downward swing (Fig. 2a, bottom row). Since δHBV2's conceptual model backbone is not fundamentally different from other models, the parameterization through big-data training appears to have greatly improved the sensitivity to decadal-scale changes, allowing us to identify and predict these trends.
To support assessing models' seasonal rainfall-runoff responses (called elasticity), we evaluate GWMs' abilities to capture monthly flow fluctuations (Fig. 3a), monthly runoff autocorrelation (Fig. 3b) which relates to the recession behavior, and elasticity itself (Fig. 3c). For the natural rivers, δHBV2 ranks first among all models with a median correlation of 0.89 at the monthly scale (Fig. 3a), showing success at capturing the hydrological seasonality, although GWM0, which is enhanced by post-processing, achieves a slightly higher NSE. Unlike GMW0, δHBV2 does not impose a post-simulation bias correction and can thus ensure consistency for the internal hydrologic fluxes like ET, baseflow, and recharge, meaning these variables are also indirectly informed by the learning process and consistent with alternative estimates. The strong representation for autocorrelation (Fig. 3b for large natural rivers) suggests the model releases storage-dependent baseflow with the correct timing after conditioning by data, which is challenging for established GWMs. The established GWMs show substantial scattering in the arid regions for major rivers (high range of elasticity in Fig. 3c), while the data-trained model stays closer to the 1-to-1 line. Overall, in all categories of the evaluated hydrologic signatures (one value per basin) related to rainfall responses, δHBV2 scores the highest except in ACF(1) (autocorrelation function with 1 month lag), where it places third with a margin of 0.01 (Supplementary Table S3). The difference in winter elasticity is smaller than in summer, presumably because the response in the wetter regime is faster and more linear.
When examined regionally, δHBV2 tends to perform well in boreal or northern midlatitude rivers but tends to be challenged in river basins where human water use is significant, e.g., Rio Grande de Santiago in the "northern dry" category, Cooper Creek in "southern dry" and the Columbia River, for which a large fraction of the catchment is arid (Fig. 4). Other rivers with significant reservoirs, e.g., Danube and Missouri, can also be challenging. Among the 33 mixed-impact rivers, 19 rivers have at least one GWM (excluding GWM0 -- see discussion above) with a monthly NSE of 0.2 or above, and are regarded as meaningful for benchmarking (Fig. 4a). Excluding GWM0, δHBV2 has the highest monthly NSE (median ~0.77) for 15 of these 19 rivers (Fig. 4a).
The high resolution of δHBV2 enables the diagnosis of seasonal local runoff sensitivity to precipitation inputs (elasticity, ε), showing contrasting summer and winter ε values that mostly complement each other (Fig. 3d, e), with large summer ε values in arid and semi-arid regions. The precipitation that generates the highest seasonal ε has an aggregation length of 6 months for summer and 3 months for winter. With the exception of central-western North America where there is some overlap, the high values for summer and winter ε tend to be staggered (clustered in different regions), identifying these regions as being "summer responsive" or "winter responsive". Summer ε is the largest in arid and semi-arid regions, e.g., Sahel, central and south Africa, central and southern South America, central Asia, and northern China and Siberia (Fig. 3d). In these regions, summer ε is high due to a low runoff baseline caused by high evapotranspiration and the dominance of storage-dependent groundwater releases driven by precipitation accumulated in previous months. Winter ε has a smaller range in values than summer ε, likely due to more linear hydrological responses once key thresholds -- such as land abstraction -- are exceeded, resulting in more proportional runoff responses to precipitation. In addition, snow storage and gradual snowmelt could also reduce the runoff response to precipitation. Summer ε is highest in central Asia and the northern Middle East, western Australia, the northern and eastern African coasts, eastern Brazil, and southern Patagonia (Fig. 3e). In these regions, there is a more direct and immediate runoff response to winter rainfall. Such a refined understanding, to our knowledge, has not been offered before by a GWM.
For low-flow-dependent aquatic or riparian ecosystems, local seasonal ε is arguably more ecologically impactful than annual ε. High summer ε regions are vulnerable to precipitation changes, as reduction of seasonal precipitation there could have an outsized impact on summer low flows. Hydrologically, this is because the reduced rainfall would not be sufficient to exceed some storage thresholds, resulting in disproportionate reductions in streamflows (blue water). Low-flow-dependent aquatic ecosystems in these regions could thus be sensitive to long-term changes in the precipitation regime or season-long droughts. However, high summer ε also means stakeholders can use seasonal outlooks of precipitation to reliably predict streamflow (and inflow to the downstream water bodies) in the coming summer and prepare to intervene if possible. The ε patterns appear noticeably different from some previous work that employed the Budyko curve for crude analysis, partly because here we analyze seasonal ε rather than annual ε, and partly because a data-trained, high-resolution hydrologic model is now available.
To further validate the relevance of high-resolution δHBV2 for local water management, we compare its daily streamflow simulations in small-to-medium basins (<50,000 km²) with LSTM, lumped δHBV, and a widely-used operational global-scale product, GloFAS, as previous GWMs lack sufficient spatial resolution for this scale. We use several datasets (ds0-ds2, see "Methods") where comparisons are reasonable between different models. In a test spanning both training and testing basins, δHBV2 performs well, with a median daily NSE of 0.63 for all basins and 0.53 for ungauged basins, which is more than double GloFAS's median NSE of ~0.26 (Fig. 5a). On another set of basins where comparison with GWMs is possible, δHBV2 noticeably leads GloFAS which, in turn, leads GWM0 (Supplementary Fig. S9). The strong performance of δHBV2 stems from its big-data training for parameterization and generalization, which allows it to leverage data synergy. Distributed model δHBV2 shows a clear improvement compared to the original lumped model δHBV1.0 with its ability to resolve high-resolution heterogeneity; it even edges out LSTM in terms of KGE, with a median of 0.74 vs. 0.73 for the higher-performing basins (Fig. 5b). This advantage against δHBV1.0 is more pronounced in data-rich regions, e.g., North America, where the subbasin-scale simulations can be better constrained by small training basins (Supplementary Table S4). There is still a slight gap between LSTM and δHBV2 in the lower half for daily KGE values (bottom left-hand side of Fig. 5b), likely due to systematic forcing biases and unrepresented heavy water uses in some basins Both δHBV2 and LSTM thus represent the state of the art for smaller basins, allowing them to accurately capture the flow variability. However, LSTM cannot simulate diagnostic variables like ET, baseflow, recharge, soil moisture, and snowmelt for providing narratives to stakeholders, and can suffer more when extrapolating to data-scarce regions and unseen extreme events. Regardless, these results suggest that models with global coverage can finally be locally relevant for tasks such as flood forecasting and short-term water management.
The data-trained δHBV2 model showed large high-flow flashiness for arid and semi-arid regions and rather low flashiness for tropical rainforests with latitudes above 45 degrees (Fig. 5c). High-flow flashiness is quantified by the slopes of the flow duration curve between the 1st and 33rd exceedance percentiles (S) that are below -5 on the log scale, which corresponds to more than a 50 times difference between the 33rd and 1st percentile flows. Arid regions have the most negative slopes due to very low baseflows and prevalent quick, heavy storms. The eastern US, western and central Europe, and southern China have moderate S values of -3 to -1, as baseflow can be substantial. The tropical rainforests, including the Amazon, Congo, and those throughout Pacific Asia, have the smallest S, since even during the dry season, streamflow in these regions can be significant. While these results are not surprising, they had not been shown at the global scale using a high-resolution model.
Examining the changes in flashiness over time, we find widespread and statistically-significant but spatially-mixed trends that do not easily match other spatial patterns studied here (Fig. 5d). S prominently increases and becomes less negative in Mexico, western South America, and northern India, indicating a less variable distribution of streamflows in the last 20 years. However, central-western USA, southern Africa, the south fringe of the Sahara Desert, central Asia, and Ethiopia have seen large declining and thus more negative S values, highlighting increasing streamflow variability. These changes are regional but can be substantial; we believe they may be caused by changes in precipitation intensities. Increasing flashiness poses the need for a greater storage capacity to ensure water supply resilience as well as control floods, although ecological consequences must also be considered. High resolution GWMs consistent with daily data are crucial for these applications.
Besides water withdrawals and very cold or very dry climates, certain downstream riverine and hydrological processes including large natural inland lakes, wetlands, and major reservoirs pose challenges for δHBV2. Large inland lakes and wetlands can store water, attenuate peak flows, sustain base flows, and induce floodplain water losses. They are not well simulated by the Muskingum-Cunge formulation. Examples include the Neva (Lake Ladoga), Paraguai (the Pantanal wetlands), Amazon (floodplains), Winnipeg (Lake Winnipeg), and Saint Lawrence (the Great Lakes) rivers. Due to the sizes of such storages, their impacts can even be noticeable at the monthly time scale. Additionally, flood-control dams, hydroelectric dams, and dams for other purposes such as irrigation and recreation each operate with distinct objectives, further contributing to the complexity of their impacts. Combined with significant water withdrawals, it is challenging for the model to accurately capture such flow behaviors, which is only exacerbated in places with relatively few gages like Africa, e.g., the Niger and Congo rivers. Overall, due to structural limitations in both the HBV and Muskingum-Cunge models, δHBV2 has a limited capacity to represent anthropogenic impacts such as reservoirs, land use changes, and human water uses, although physical parameters learned from observational data can partially compensate for some of the resulting errors.
It is noteworthy that in our training and forward simulations, land-use inputs are regarded as static, so the model's long-term output shifts mainly reflect changes in the climate inputs. It is possible that the dynamic parameters produced by the NNs jointly trained on streamflow data can capture some interannual covariation of vegetation-related characteristics due to climate shifts, but we do not expect this to be a major effect. The impacts of land use change, e.g., in central Asia and Ethiopia, are not explicitly simulated, which should be considered in future efforts.
While we do not have an ensemble of models to produce a formal uncertainty estimate, we present maps of large basin and upstream-catchment evaluations (Supplementary Fig. S2) as a gauge of model reliability in different regions at different scales. Some large basins, e.g., Ganges, Siberian, and southern Brazilian ones, are well simulated in terms of both NSE and long-term trends, despite not having any training basins. This suggests the model does possess some ability to generalize in space. Nevertheless, some poor-performing basins still tend to be due to a lack of training stations, especially African and north-central Asian ones, where hydrologic dynamics may also be systematically different from the training basins. These regions are expected to improve when we learn from additional data, including expanded training stations, new streamflow estimates from the Surface Water and Ocean Topography (SWOT) mission, and non-streamflow observations such as soil moisture. Future efforts can also leverage an ensemble based on different training data to assess uncertainty.
Driven by climatic shifts, the terrestrial water cycle is undergoing significant changes in quantity, timing, and hydrologic response patterns. Using our high-resolution, high-accuracy, physics-embedded model, we identified coherent and widespread shifts between 2001 and 2020 in fundamental water partitioning -- between evapotranspiration and runoff as well as between surface and subsurface runoff. These changes are primarily nonlinear responses to precipitation variations. For example, North America and parts of Asia have seen increased blue-water and surface runoff fractions leading to higher flood risks, while the Southern Hemisphere, tropical rainforests, and parts of Europe have experienced increases in green-water fluxes (i.e., evapotranspiration) and baseflow fractions leading to reduced river flows. Consequently, freshwater inputs into some European estuaries have declined markedly, with associated ecological impacts. Arid and semi-arid regions already show high flow variability and runoff elasticity, making them especially vulnerable to future shifts in seasonal precipitation patterns. Some arid regions also see precipitation pattern changes leading to even higher flow variability. We provide a high-resolution map of the response patterns and changes to identify future challenges in water supplies and aquatic ecosystems.
While understanding hydrologic response patterns is helpful for water management, analyzing them and their changes requires a model that can accurately diagnose hydrologic fluxes with high spatiotemporal resolution. Traditional models often struggle to extract information from large datasets to characterize the landscape's hydrologic responses, manifesting in the difficulty to describe the spatial distributions of hydrologic signatures and their trends. Meanwhile, purely data-driven methods do not provide diagnostic variables or respect physical laws like mass conservation. Our approach -- differentiable, physics-embedded learning -- addresses both limitations. It offers a globally-consistent, high-resolution, physically-coherent picture of how hydrologic response patterns are shifting in response to the climate, enabling local-scale fine-grained analyses and decision-making while revealing many previously unrecognized changes described above. This modeling capability is essential for understanding and managing future water availability, aquatic ecosystem risks, and hydrologic resilience in a changing climate.