Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spei, proper use of loc parameter in scipy dists #1653

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,20 @@ Changelog

v0.49.0 (unreleased)
--------------------
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Juliette Lavoie (:user:`juliettelavoie`).
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Juliette Lavoie (:user:`juliettelavoie`), Éric Dupuis (:user:`coxipi`).

Announcements
^^^^^^^^^^^^^
* `xclim` has migrated its development branch name from `master` to `main`. (:issue:`1667`, :pull:`1669`).

New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Distributions with negative values are directly fitted without need for an offset for distributions such as `gamma, fisk` in ``xclim.indices.standardized_precipitation_evapotranspiration_index``. (:issue:`1477` :pull:`1653`).

Breaking changes
^^^^^^^^^^^^^^^^
* Argument `offset` is removed from ``xclim.indices.standardized_precipitation_evapotranspiration_index``. This will slightly change results when using SPEI with default values (:issue:`1477` :pull:`1653`).

Bug fixes
^^^^^^^^^
* Fixed an bug in sdba's ``map_groups`` that prevented passing DataArrays with cftime coordinates if the ``sdba_encode_cf`` option was True. (:issue:`1673`, :pull:`1674`).
Expand All @@ -21,7 +29,6 @@ Internal changes
^^^^^^^^^^^^^^^^
* Added "doymin" and "doymax" to the possible operations of ``generic.stats``. Fixed a warning issue when ``op`` was "integral". (:pull:`1672`).


v0.48.2 (2024-02-26)
--------------------
Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`).
Expand Down
3 changes: 2 additions & 1 deletion tests/test_indices.py
Original file line number Diff line number Diff line change
Expand Up @@ -695,14 +695,15 @@ def test_standardized_precipitation_evapotranspiration_index(
tas = tasmax - 2.5
tasmin = tasmax - 5
wb = xci.water_budget(pr, None, tasmin, tasmax, tas)
# offset like in climate indices library
wb = wb + convert_units_to("1 mm/d", wb, context="hydro")

params = xci.stats.standardized_index_fit_params(
wb.sel(time=slice("1950", "1980")),
freq=freq,
window=window,
dist=dist,
method=method,
offset="1 mm/d",
)
spei = xci.standardized_precipitation_evapotranspiration_index(
wb.sel(time=slice("1998", "2000")), params=params
Expand Down
118 changes: 38 additions & 80 deletions xclim/indices/_agro.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,7 @@
)
from xclim.indices.generic import aggregate_between_dates, get_zones
from xclim.indices.helpers import _gather_lat, day_lengths
from xclim.indices.stats import (
preprocess_standardized_index,
standardized_index,
standardized_index_fit_params,
)
from xclim.indices.stats import standardized_index

# Frequencies : YS: year start, QS-DEC: seasons starting in december, MS: month start
# See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Expand Down Expand Up @@ -1094,16 +1090,15 @@ def _get_rain_season(pram):

@declare_units(
pr="[precipitation]",
pr_cal="[precipitation]",
params="[]",
)
def standardized_precipitation_index(
pr: xarray.DataArray,
pr_cal: Quantified | None = None,
freq: str | None = "MS",
window: int = 1,
dist: str = "gamma",
method: str = "APP",
fitkwargs: dict = {},
cal_start: DateStr | None = None,
cal_end: DateStr | None = None,
params: Quantified | None = None,
Expand All @@ -1115,9 +1110,6 @@ def standardized_precipitation_index(
----------
pr : xarray.DataArray
Daily precipitation.
pr_cal : xarray.DataArray, optional
Daily precipitation used for calibration. Usually this is a temporal subset of `pr` over some reference period.
This option will be removed in xclim >=0.47.0. Two behaviour will be possible (see below)
freq : str, optional
Resampling frequency. A monthly or daily frequency is expected. Option `None` assumes that desired resampling
has already been applied input dataset and will skip the resampling step.
Expand All @@ -1129,6 +1121,8 @@ def standardized_precipitation_index(
method : {'APP', 'ML'}
Name of the fitting method, such as `ML` (maximum likelihood), `APP` (approximate). The approximate method
uses a deterministic function that doesn't involve any optimization.
fitkwargs : dict
Kwargs passed to ``xclim.indices.stats.fit`` used to impose values of certains parameters (`floc`, `fscale`).
cal_start : DateStr, optional
Start date of the calibration period. A `DateStr` is expected, that is a `str` in format `"YYYY-MM-DD"`.
Default option `None` means that the calibration period begins at the start of the input dataset.
Expand Down Expand Up @@ -1189,75 +1183,33 @@ def standardized_precipitation_index(
----------
:cite:cts:`mckee_relationship_1993`
"""
if params is not None and pr_cal is None:
freq, window, indexer = (
params.attrs[s] for s in ["freq", "window", "time_indexer"]
)
# Unpack attrs to None and {} if needed
freq = None if freq == "" else freq
indexer = {} if indexer[0] == "" else {indexer[0]: indexer[1:]}
if cal_start or cal_end:
warnings.warn(
"Expected either `cal_{start|end}` or `params`, got both. The `params` input overrides other inputs."
"If `cal_start`, `cal_end`, `freq`, `window`, and/or `dist` were given as input, they will be ignored."
dist_methods = {"gamma": ["ML", "APP", "PWM"], "fisk": ["ML", "APP"]}
if dist in dist_methods.keys():
if method not in dist_methods[dist]:
raise NotImplementedError(
f"{method} method is not implemented for {dist} distribution"
)

if pr_cal is not None:
warnings.warn(
"Inputting a calibration array will be deprecated in xclim>=0.47.0. "
"For example, if `pr_cal` is a subset of `pr`, then instead of say:\n"
"`standardized_precipitation_index(pr=pr,pr_cal=pr.sel(time=slice(t0,t1)),...)`,\n"
"one can call:\n"
"`standardized_precipitation_index(pr=pr,cal_range=(t0,t1),...).\n"
"If for some reason `pr_cal` is not a subset of `pr`, then the following approach will still be possible:\n"
"`params = standardized_index_fit_params(da=pr_cal, freq=freq, window=window, dist=dist, method=method)`.\n"
"`spi = standardized_precipitation_index(pr=pr, params=params)`.\n"
"This approach can be used in both scenarios to break up the computations in two,"
"i.e. get params, then compute standardized indices."
)
params = standardized_index_fit_params(
pr_cal, freq=freq, window=window, dist=dist, method=method, **indexer
)

pr, _ = preprocess_standardized_index(pr, freq=freq, window=window, **indexer)
if params is None:
params = standardized_index_fit_params(
pr.sel(time=slice(cal_start, cal_end)),
freq=None,
window=1,
dist=dist,
method=method,
)

# If params only contains a subset of main dataset time grouping
# (e.g. 8/12 months, etc.), it needs to be broadcasted
template = pr.groupby(params.attrs["group"]).first()
paramsd = {k: v for k, v in params.sizes.items() if k != "dparams"}
if paramsd != template.sizes:
params = params.broadcast_like(template)

spi = standardized_index(pr, params)
spi.attrs = params.attrs
spi.attrs["freq"] = (freq or xarray.infer_freq(spi.time)) or "undefined"
spi.attrs["window"] = window
spi.attrs["units"] = ""
else:
raise NotImplementedError(f"{dist} distribution is not implemented yet")
spi = standardized_index(
pr, freq, window, dist, method, fitkwargs, cal_start, cal_end, params, **indexer
)
return spi


@declare_units(
wb="[precipitation]",
wb_cal="[precipitation]",
offset="[precipitation]",
params="[]",
)
def standardized_precipitation_evapotranspiration_index(
wb: xarray.DataArray,
wb_cal: Quantified | None = None,
freq: str | None = "MS",
window: int = 1,
dist: str = "gamma",
method: str = "APP",
offset: Quantified = "1.000 mm/d",
fitkwargs: dict = {},
offset: Quantified = "",
cal_start: DateStr | None = None,
cal_end: DateStr | None = None,
params: Quantified | None = None,
Expand All @@ -1273,9 +1225,6 @@ def standardized_precipitation_evapotranspiration_index(
----------
wb : xarray.DataArray
Daily water budget (pr - pet).
wb_cal : xarray.DataArray, optional
Daily water budget used for calibration. Usually this is a temporal subset of `wb` over some reference period.
This option will be removed in xclim >=0.47.0. Two behaviours will be possible (see below).
freq : str, optional
Resampling frequency. A monthly or daily frequency is expected. Option `None` assumes that desired resampling
has already been applied input dataset and will skip the resampling step.
Expand All @@ -1289,6 +1238,12 @@ def standardized_precipitation_evapotranspiration_index(
`PWM` (probability weighted moments).
The approximate method uses a deterministic function that doesn't involve any optimization. Available methods
vary with the distribution: 'gamma':{'APP', 'ML', 'PWM'}, 'fisk':{'APP', 'ML'}
fitkwargs : dict
Kwargs passed to ``xclim.indices.stats.fit`` used to impose values of certains parameters (`floc`, `fscale`).
offset : Quantified
For distributions bounded by zero (e.g. "gamma", "fisk"), the two-parameters distributions only accept positive
values. An offset can be added to make sure this is the case. This option will be removed in xclim >=0.49.0, ``xclim``
will rely on three-parameters distributions instead.
cal_start : DateStr, optional
Start date of the calibration period. A `DateStr` is expected, that is a `str` in format `"YYYY-MM-DD"`.
Default option `None` means that the calibration period begins at the start of the input dataset.
Expand All @@ -1299,11 +1254,6 @@ def standardized_precipitation_evapotranspiration_index(
Fit parameters.
The `params` can be computed using ``xclim.indices.stats.standardized_index_fit_params`` in advance.
The output can be given here as input, and it overrides other options.
offset : Quantified
For distributions bounded by zero (e.g. "gamma", "fisk"), an offset must be added to the water budget
to make sure there are no negative values.
Keep the offset as small as possible to minimize its influence on the results.
This can be given as a precipitation flux or a rate.
\*\*indexer
Indexing parameters to compute the indicator on a temporal subset of the data.
It accepts the same arguments as :py:func:`xclim.indices.generic.select_time`.
Expand All @@ -1323,7 +1273,9 @@ def standardized_precipitation_evapotranspiration_index(

See Standardized Precipitation Index (SPI) for more details on usage.
"""
uses_default_offset = offset == "1.000 mm/d"
uses_default_offset = offset != ""
if uses_default_offset is False:
warnings.warn("Inputting an offset will be deprecated in xclim>=0.49.0. ")
if params is not None:
params_offset = params.attrs["offset"]
if uses_default_offset is False and offset != params_offset:
Expand All @@ -1337,20 +1289,26 @@ def standardized_precipitation_evapotranspiration_index(
if dist in ["gamma", "fisk"] and offset <= 0:
raise ValueError(
"The water budget must be shifted towards positive values to be used with `gamma` and `fisk` "
"distributions which are bounded by zero. A positive offset is required. Current value: "
f"{offset}{wb.attrs['units']}."
"distributions which are bounded by zero (only when using two-parameters distributions: in xclim>=0.49.0,"
"three-parameters distributions are used to accommodate negative values). Only positive offsets are accepted."
)

# Note that the default behaviour would imply an offset for any distribution, even those distributions
# that can accommodate negative values of the water budget. This needs to be changed in future versions
# of the index.
if offset != 0:
with xarray.set_options(keep_attrs=True):
wb = wb + offset
if wb_cal is not None:
wb_cal = wb_cal + offset

spei = standardized_precipitation_index(
wb, wb_cal, freq, window, dist, method, cal_start, cal_end, params, **indexer
dist_methods = {"gamma": ["ML", "APP", "PWM"], "fisk": ["ML", "APP"]}
if dist in dist_methods.keys():
if method not in dist_methods[dist]:
raise NotImplementedError(
f"{method} method is not implemented for {dist} distribution"
)
else:
raise NotImplementedError(f"{dist} distribution is not implemented yet")
spei = standardized_index(
wb, freq, window, dist, method, fitkwargs, cal_start, cal_end, params, **indexer
)

return spei
Expand Down
Loading
Loading