Add functions to partition uncertainties according to method from Haw…

…kins & Sutton (2009) (#1262)   ### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #771 - [x] Tests for the changes have been added (for bug fixes / features) - [ ] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] HISTORY.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? * Add functions to compute variance using method from Hawkins & Sutton (2009). ### Does this PR introduce a breaking change? No ### Other information: Only tested qualitatively so far. Looks ok but I'd like a more solid validation. See related PRs: - Ouranosinc/xclim-testdata#23 - Ouranosinc/figanos#9
Ouranosinc · Jun 28, 2023 · 84e7644 · 84e7644
2 parents 94e7910 + 09b2ae7
commit 84e7644
Show file tree

Hide file tree

Showing 8 changed files with 531 additions and 9 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -4,7 +4,11 @@ Changelog
 
 v0.45.0 (unreleased)
 --------------------
-Contributors to this version: Trevor James Smith (:user:`Zeitsperre`).
+Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`).
+
+New features and enhancements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Added ``ensembles.hawkins_sutton`` method to partition the uncertainty sources in a climate projection ensemble. (:issue:`771`, :pull:`1262`).
 
 Internal changes
 ^^^^^^^^^^^^^^^^
@@ -196,11 +200,11 @@ New features and enhancements
 
 New indicators
 ^^^^^^^^^^^^^^
-* ``ensembles.change_significance`` now supports Mann-whitney U-test and flexible ``realization``. (:pull:`1285`).
 * New indices and indicators for converting from snow water equivalent to snow depth (``snw_to_snd``) and snow depth to snow water equivalent (``snd_to_snw``) using snow density [kg/m^3]. (:pull:`1271`).
 * New indices and indicators for determining upwelling radiation (`shortwave_upwelling_radiation_from_net_downwelling` and `longwave_upwelling_radiation_from_net_downwelling`; CF variables `rsus` and `rlus`) from net and downwelling radiation (shortwave: `rss` and `rsds`; longwave: `rls` and `rlds`). (:pull:`1271`).
-* New indice and indicator ``{snd | snw}_season_{length | start | end}`` which generalize ``snow_cover_duration`` and  ``continuous_snow_cover_{start | end}`` to allow using these functions with variable `snw` (:pull:`1275`).
+* New indice and indicator ``{snd | snw}_season_{length | start | end}`` which generalize ``snow_cover_duration`` and ``continuous_snow_cover_{start | end}`` to allow using these functions with variable `snw` (:pull:`1275`).
 * New indice and indicator (``dryness_index``) for estimating soil humidity classifications for winegrowing regions (based on Riou et al. (1994)). (:issue:`355`, :pull:`1235`).
+* ``ensembles.change_significance`` now supports Mann-whitney U-test and flexible ``realization``. (:pull:`1285`).
 
 Breaking changes
 ^^^^^^^^^^^^^^^^
@@ -216,7 +220,7 @@ Breaking changes
 
 Bug fixes
 ^^^^^^^^^
-* ``build_indicator_module_from_yaml`` now accepts a ``reload`` argument. When re-building a module that already exists, ``reload=True`` removes all previous indicator before creating the new ones. (:issue:`1192`, :pull:`1284`).
+* ``build_indicator_module_from_yaml`` now accepts a ``reload`` argument. When re-building a module that already exists, ``reload=True`` removes all previous indicator before creating the new ones. (:issue:`1192`,:pull:`1284`).
 * The test for french translations of official indicators was fixed and translations for CFFWIS indices, FFDI, KDBI, DF and Jetstream metric woollings have been added or fixed. (:pull:`1271`).
 * ``use_ufunc`` in ``windowed_run_count`` is now supplied with argument ``freq`` to warn users that the 1d method does not support resampling after run length operations (:issue:`1279`, :pull:`1291`).
 * ``{snd|snw}_max_doy`` now avoids an error due to `xr.argmax` when there are all-NaN slices. (:pull:`1277`).
@@ -650,7 +654,7 @@ Internal changes
 ^^^^^^^^^^^^^^^^
 * Added a CI hook in ``.pre-commit-config.yaml`` to perform automated `pre-commit` corrections with GitHub CI. (:pull:`965`).
 * Adjusted CI hooks to fail earlier if `lint` checks fail. (:pull:`972`).
-* `TrainAdjust` and `Adjust` object have a new `skip_input_checks` keyword arg to their `train` and  `adjust` methods. When `True`, all unit-, calendar- and coordinate-related input checks are skipped. This is an ugly solution to disappearing attributes when using `xr.map_blocks` with dask. (:pull:`964`).
+* `TrainAdjust` and `Adjust` object have a new `skip_input_checks` keyword arg to their `train` and `adjust` methods. When `True`, all unit-, calendar- and coordinate-related input checks are skipped. This is an ugly solution to disappearing attributes when using `xr.map_blocks` with dask. (:pull:`964`).
 * Some slow tests were marked `slow` to help speed up the standard test ensemble. (:pull:`969`).
     - Tox testing ensemble now also reports slowest tests using the ``--durations`` flag.
 * `pint` no longer emits warnings about redefined units when the `logging` module is loaded. (:issue:`990`, :pull:`991`).
@@ -807,7 +811,7 @@ New features and enhancements
 * ``xclim.core.utils._cal_perc`` is now only a proxy for ``xc.core.utils.nan_calc_percentiles`` with some axis moves.
 * `xclim` now implements many data quality assurance flags (``xclim.core.dataflags``) for temperature and precipitation based on `ICCLIM documentation guidelines <https://www.ecad.eu/documents/atbd.pdf>`_. These checks include the following:
     - Temperature (variables: ``tas``, ``tasmin``, ``tasmax``): ``tasmax_below_tasmin``, ``tas_exceeds_tasmax``, ``tas_below_tasmin``, ``temperature_extremely_low`` (`thresh="-90 degC"`), ``temperature_extremely_high`` (`thresh="60 degC"`).
-    - Precipitation-specific (variables: ``pr``, ``prsn``, ):  ``negative_accumulation_values``, ``very_large_precipitation_events`` (`thresh="300 mm d-1"`).
+    - Precipitation-specific (variables: ``pr``, ``prsn``, ): ``negative_accumulation_values``, ``very_large_precipitation_events`` (`thresh="300 mm d-1"`).
     - Wind-specific (variables: ``sfcWind``, ``wsgsmax``/``sfcWindMax``): ``wind_values_outside_of_bounds``
     - Generic: ``outside_n_standard_deviations_of_climatology``, ``values_repeating_for_n_or_more_days``, ``values_op_thresh_repeating_for_n_or_more_days``, ``percentage_values_outside_of_bounds``.
 
@@ -924,8 +928,8 @@ New features and enhancements
 * ``humidex`` can be computed using relative humidity instead of dewpoint temperature.
 * New ``sdba.construct_moving_yearly_window`` and ``sdba.unpack_moving_yearly_window`` for moving window adjustments.
 * New ``sdba.adjustment.NpdfTransform`` which is an adaptation of Alex Cannon's version of Pitié's *N-dimensional probability density function transform*. Uses new ``sdba.utils.rand_rot_matrix``. *Experimental, subject to changes.*
-* New ``sdba.processing.standardize``, ``.unstandardize`` and  ``.reordering``. All of them, tools needed to replicate Cannon's MBCn algorithm.
-* New ``sdba.processing.escore``, backed by  ``sdba.nbutils._escore`` to evaluate the performance of the N pdf transform.
+* New ``sdba.processing.standardize``, ``.unstandardize`` and ``.reordering``. All of them, tools needed to replicate Cannon's MBCn algorithm.
+* New ``sdba.processing.escore``, backed by ``sdba.nbutils._escore`` to evaluate the performance of the N pdf transform.
 * New function ``xclim.indices.clausius_clapeyron_scaled_precipitation`` can be used to scale precipitation according to changes in mean temperature.
 * Percentile based indices gained a ``bootstrap`` argument that applies a bootstrapping algorithm to reduce biases on exceedance frequencies computed over *in base* and *out of base* periods. *Experimental, subject to changes.*
 * Added a `.zenodo.json` file for collecting and maintaining author order and tracking ORCIDs.
@@ -1377,7 +1381,7 @@ v0.15.x (2020-03-12)
 --------------------
 * Improvement in FWI: Vectorization of DC, DMC and FFMC with numba and small code refactoring for better maintainability.
 * Added example notebook for creating a catalog of selected indices
-* Added `growing_season_end`, `last_spring_frost`, `dry_days`,  `hot_spell_frequency`, `hot_spell_max_length`, and `maximum_consecutive_frost_free_days` indices.
+* Added `growing_season_end`, `last_spring_frost`, `dry_days`, `hot_spell_frequency`, `hot_spell_max_length`, and `maximum_consecutive_frost_free_days` indices.
 * Dropped use of `fiona.crs` class in lieu of the newer pyproj CRS handler for `subset_shape` operations.
 * Complete internal reorganization of xclim.
 * Internationalization of xclim : add `locales` submodule for localized metadata.

diff --git a/docs/api.rst b/docs/api.rst
@@ -100,6 +100,12 @@ Ensembles Module
 .. autofunction:: xclim.ensembles.robustness_coefficient
    :noindex:
 
+.. automodule:: xclim.ensembles._partitioning
+    :noindex:
+
+.. autofunction:: xclim.ensembles.hawkins_sutton
+    :noindex:
+
 Units Handling Submodule
 ========================
 

diff --git a/docs/references.bib b/docs/references.bib
@@ -1882,6 +1882,107 @@ @article{vrac_multivariate_2018
 	year = {2018},
 }
 
+@article{yip_2011,
+abstract = {A simple and coherent framework for partitioning uncertainty in multimodel climate ensembles is presented. The analysis of variance (ANOVA) is used to decompose a measure of total variation additively into scenario uncertainty, model uncertainty, and internal variability. This approach requires fewer assumptions than existing methods and can be easily used to quantify uncertainty related to model–scenario interaction—the contribution to model uncertainty arising from the variation across scenarios of model deviations from the ensemble mean. Uncertainty in global mean surface air temperature is quantified as a function of lead time for a subset of the Coupled Model Intercomparison Project phase 3 ensemble and results largely agree with those published by other authors: scenario uncertainty dominates beyond 2050 and internal variability remains approximately constant over the twenty-first century. Both elements of model uncertainty, due to scenario-independent and scenario-dependent deviations from the ensemble mean, are found to increase with time. Estimates of model deviations that arise as by-products of the framework reveal significant differences between models that could lead to a deeper understanding of the sources of uncertainty in multimodel ensembles. For example, three models show a diverging pattern over the twenty-first century, while another model exhibits an unusually large variation among its scenario-dependent deviations.},
+author = {Yip, Stan and Ferro, Christopher A. T. and Stephenson, David B. and Hawkins, Ed},
+doi = {10.1175/2011JCLI4085.1},
+issn = {0894-8755},
+journal = {Journal of Climate},
+month = {sep},
+number = {17},
+pages = {4634--4643},
+title = {{A Simple, Coherent Framework for Partitioning Uncertainty in Climate Predictions}},
+url = {https://journals.ametsoc.org/doi/10.1175/2011JCLI4085.1},
+volume = {24},
+year = {2011}
+}
+
+@article{northrop_2014,
+abstract = {A simple statistical model is used to partition uncertainty from different sources, in projections of future climate from multimodel ensembles. Three major sources of uncertainty are considered: the choice of climate model, the choice of emissions scenario, and the internal variability of the modeled climate system. The relative contributions of these sources are quantified for mid- and late-twenty-first-century climate projections, using data from 23 coupled atmosphere–ocean general circulation models obtained from phase 3 of the Coupled Model Intercomparison Project (CMIP3). Similar investigations have been carried out recently by other authors but within a statistical framework for which the unbalanced nature of the data and the small number (three) of scenarios involved are potentially problematic. Here, a Bayesian analysis is used to overcome these difficulties. Global and regional analyses of surface air temperature and precipitation are performed. It is found that the relative contributions to uncertainty depend on the climate variable considered, as well as the region and time horizon. As expected, the uncertainty due to the choice of emissions scenario becomes more important toward the end of the twenty-first century. However, for midcentury temperature, model internal variability makes a large contribution in high-latitude regions. For midcentury precipitation, model internal variability is even more important and this persists in some regions into the late century. Implications for the design of climate model experiments are discussed.},
+author = {Northrop, Paul J. and Chandler, Richard E.},
+doi = {10.1175/JCLI-D-14-00265.1},
+issn = {0894-8755},
+journal = {Journal of Climate},
+month = {dec},
+number = {23},
+pages = {8793--8808},
+title = {{Quantifying Sources of Uncertainty in Projections of Future Climate*}},
+url = {http://journals.ametsoc.org/doi/10.1175/JCLI-D-14-00265.1},
+volume = {27},
+year = {2014}
+}
+
+@article{goldenson_2018,
+author = {Goldenson, N. and Mauger, G. and Leung, L. R. and Bitz, C. M. and Rhines, A.},
+doi = {10.1002/2017GL076297},
+issn = {0094-8276},
+journal = {Geophysical Research Letters},
+month = {jan},
+number = {2},
+pages = {926--934},
+title = {{Effects of Ensemble Configuration on Estimates of Regional Climate Uncertainties}},
+url = {https://onlinelibrary.wiley.com/doi/10.1002/2017GL076297},
+volume = {45},
+year = {2018}
+}
+
+@article{lehner_2020,
+author = {Lehner, Flavio and Deser, Clara and Maher, Nicola and Marotzke, Jochem and Fischer, Erich M. and Brunner, Lukas and Knutti, Reto and Hawkins, Ed},
+doi = {10.5194/esd-11-491-2020},
+issn = {2190-4987},
+journal = {Earth System Dynamics},
+month = {may},
+number = {2},
+pages = {491--508},
+title = {{Partitioning climate projection uncertainty with multiple large ensembles and CMIP5/6}},
+url = {https://esd.copernicus.org/articles/11/491/2020/},
+volume = {11},
+year = {2020}
+}
+
+@article{evin_2019,
+abstract = {The quantification of uncertainty sources in ensembles of climate projections obtained from combinations of different scenarios and climate and impact models is a key issue in climate impact studies. The small size of the ensembles of simulation chains and their incomplete sampling of scenario and climate model combinations makes the analysis difficult. In the popular single-time ANOVA approach for instance, a precise estimate of internal variability requires multiple members for each simulation chain (e.g., each emission scenario–climate model combination), but multiple members are typically available for a few chains only. In most ensembles also, a precise partition of model uncertainty components is not possible because the matrix of available scenario/models combinations is incomplete (i.e., projections are missing for many scenario–model combinations). The method we present here, based on data augmentation and Bayesian techniques, overcomes such limitations and makes the statistical analysis possible for single-member and incomplete ensembles. It provides unbiased estimates of climate change responses of all simulation chains and of all uncertainty variables. It additionally propagates uncertainty due to missing information in the estimates. This approach is illustrated for projections of regional precipitation and temperature for four mountain massifs in France. It is applicable for any kind of ensemble of climate projections, including those produced from ad hoc impact models.},
+author = {Evin, Guillaume and Hingray, Benoit and Blanchet, Juliette and Eckert, Nicolas and Morin, Samuel and Verfaillie, Deborah},
+doi = {10.1175/JCLI-D-18-0606.1},
+issn = {0894-8755},
+journal = {Journal of Climate},
+month = {apr},
+number = {8},
+pages = {2423--2440},
+title = {{Partitioning Uncertainty Components of an Incomplete Ensemble of Climate Projections Using Data Augmentation}},
+url = {http://journals.ametsoc.org/doi/10.1175/JCLI-D-18-0606.1},
+volume = {32},
+year = {2019}
+}
+
+@article{hawkins_2009,
+author = {Hawkins, Ed and Sutton, Rowan},
+doi = {10.1175/2009BAMS2607.1},
+issn = {0003-0007},
+journal = {Bulletin of the American Meteorological Society},
+month = {aug},
+number = {8},
+pages = {1095--1108},
+title = {{The Potential to Narrow Uncertainty in Regional Climate Predictions}},
+url = {https://journals.ametsoc.org/doi/10.1175/2009BAMS2607.1},
+volume = {90},
+year = {2009}
+}
+
+@article{hawkins_2011,
+author = {Hawkins, Ed and Sutton, Rowan},
+doi = {10.1007/s00382-010-0810-6},
+issn = {0930-7575},
+journal = {Climate Dynamics},
+month = {jul},
+number = {1-2},
+pages = {407--418},
+title = {{The potential to narrow uncertainty in projections of regional precipitation change}},
+url = {http://link.springer.com/10.1007/s00382-010-0810-6},
+volume = {37},
+year = {2011}
+}
+
 @article{sivakumar_predicting_1998,
 title = {Predicting rainy season potential from the onset of rains in Southern Sahelian and Sudanian climatic zones of West Africa},
 journal = {Agricultural and Forest Meteorology},

diff --git a/tests/test_filters.py b/tests/test_filters.py
diff --git a/tests/test_partitioning.py b/tests/test_partitioning.py
@@ -0,0 +1,72 @@
+from __future__ import annotations
+
+import numpy as np
+import xarray as xr
+
+from xclim.ensembles import hawkins_sutton
+from xclim.ensembles._filters import _concat_hist, _model_in_all_scens, _single_member
+
+
+def test_hawkins_sutton_smoke(open_dataset):
+    """Just a smoke test."""
+    dims = {"run": "member", "scen": "scenario"}
+    da = (
+        open_dataset(
+            "uncertainty_partitioning/cmip5_pr_global_mon.nc", branch="hawkins_sutton"
+        )
+        .pr.sel(time=slice("1950", None))
+        .rename(dims)
+    )
+    da1 = _model_in_all_scens(da)
+    dac = _concat_hist(da1, scenario="historical")
+    das = _single_member(dac)
+    hawkins_sutton(das)
+
+
+def test_hawkins_sutton_synthetic():
+    """Test logic of Hawkins-Sutton's implementation using synthetic data."""
+
+    # Time, scenario, model
+    # Here the scenarios don't change over time, so there should be no model variability (since it's relative to the
+    # reference period.
+    sm = np.arange(10, 41, 10)  # Scenario mean
+    mm = np.arange(-6, 7, 1)  # Model mean
+    mean = mm[np.newaxis, :] + sm[:, np.newaxis]
+
+    # Natural variability
+    r = np.random.randn(4, 13, 60)
+
+    x = r + mean[:, :, np.newaxis]
+    time = xr.date_range("1970-01-01", periods=60, freq="Y")
+    da = xr.DataArray(x, dims=("scenario", "model", "time"), coords={"time": time})
+    m, v = hawkins_sutton(da)
+    # Mean uncertainty over time
+    vm = v.mean(dim="time")
+
+    # Check that the mean relative to the baseline is zero
+    np.testing.assert_array_almost_equal(m.mean(dim="time"), 0, decimal=1)
+
+    # Check that the scenario uncertainty is zero
+    np.testing.assert_array_almost_equal(vm.sel(uncertainty="scenario"), 0, decimal=1)
+
+    # Check that model uncertainty > variability
+    assert vm.sel(uncertainty="model") > vm.sel(uncertainty="variability")
+
+    # Smoke test with polynomial of order 2
+    fit = da.polyfit(dim="time", deg=2, skipna=True)
+    sm = xr.polyval(coord=da.time, coeffs=fit.polyfit_coefficients).where(da.notnull())
+    hawkins_sutton(da, sm=sm)
+
+    # Test with a multiplicative variable and time evolving scenarios
+    r = np.random.randn(4, 13, 60) + np.arange(60)
+    x = r + mean[:, :, np.newaxis]
+    da = xr.DataArray(x, dims=("scenario", "model", "time"), coords={"time": time})
+    m, v = hawkins_sutton(da, kind="*")
+    su = v.sel(uncertainty="scenario")
+    # We expect the scenario uncertainty to grow over time
+    # The scenarios all have the same absolute slope, but since their reference mean is different, the relative increase
+    # is not the same and this creates a spread over time across "relative" scenarios.
+    assert (
+        su.sel(time=slice("2020", None)).mean()
+        > su.sel(time=slice("2000", "2010")).mean()
+    )
diff --git a/xclim/ensembles/__init__.py b/xclim/ensembles/__init__.py
@@ -10,6 +10,7 @@
 from __future__ import annotations
 
 from ._base import create_ensemble, ensemble_mean_std_max_min, ensemble_percentiles
+from ._partitioning import hawkins_sutton
 from ._reduce import (
     kkz_reduce_ensemble,
     kmeans_reduce_ensemble,