Skip to content

Releases: stan-dev/loo

loo v2.8.0

03 Jul 20:27
Compare
Choose a tag to compare
  • make E_loo Pareto-k diagnostic more robust by @avehtari in #251
  • update psis paper reference by @avehtari in #252
  • update PSIS references in vignettes by @jgabry in #254
  • fix loo_moment_match p_loo computation by @avehtari in #257
  • fix loo_moment_matching NaN issue by @avehtari in #259
  • catch Stan log_prob exceptions inside moment matching by @avehtari in #262
  • Fix E_loo_khat error when posterior::pareto_khat returns NA by @jgabry in #264
  • update psis ref + some minor typo fixes by @avehtari in #266
  • update PSIS ref + link to Nabiximols study for Jacobian correction by @avehtari in #267
  • Fix issue with pareto_khat output no longer being a list by @n-kall in #269
  • fix equations in loo-glossary by @avehtari in #268

New Contributors

Full Changelog: v2.7.0...v2.8.0

loo v2.7.0

25 Feb 19:38
Compare
Choose a tag to compare

Major changes

  • New sample size specific diagnostic threshold for Pareto k.
    The pre-2022 version of the PSIS paper recommended diagnostic thresholds of
    k < 0.5 "good"
    0.5 <= k < 0.7 "ok"
    0.7 <= k < 1 "bad"
    k>=1 "very bad"
    The 2022 revision of the PSIS paper now recommends
    k < min(1 - 1/log10(S), 0.7) "good"
    min(1 - 1/log10(S), 0.7) <= k < 1 "bad"
    k > 1 "very bad"
    where S is the sample size.
    There is now one fewer diagnostic threshold ("ok" has been removed), and the
    most important threshold now depends on the sample size S. With sample sizes
    100, 320, 1000, 2200, 10000 the sample size specific part 1 - 1/log10(S)
    corresponds to thresholds of 0.5, 0.6, 0.67, 0.7, 0.75.
    Even if the sample size grows, the bias in the PSIS estimate dominates if
    0.7 <= k < 1, and thus the diagnostic threshold for good is capped at
    0.7 (if k > 1, the mean does not exist and bias is not a valid measure).
    The new recommended thresholds are based on more careful bias-variance analysis
    of PSIS based on truncated Pareto sums theory. For those who use the Stan
    default 4000 posterior draws, the 0.7 threshold will be roughly the same, but
    there will be fewer warnings as there will be no diagnostic message for 0.5 <= k < 0.7.
    Those who use smaller sample sizes may see diagnostic messages with a
    threshold less than 0.7, and they can simply increase the sample size to about
    2200 to get the threshold to 0.7.

  • No more warnings if the r_eff argument is not provided, and the
    default is now r_eff = 1. The summary print output showing MCSE and ESS now
    shows diagnostic information on the range of r_eff. The change was made to
    reduce unnecessary warnings. The use of r_eff does not change the expected
    value of elpd_loo, p_loo, and Pareto k, and is needed only to estimate
    MCSE and ESS. Thus it is better to show the diagnostic information about r_eff
    only when MCSE and ESS values are shown.

Other changes

New Contributors

Full Changelog: v2.6.0...v2.7.0

loo v2.6.0

31 Mar 13:08
Compare
Choose a tag to compare

New features

  • New loo_predictive_metric() function for computing estimates of leave-one-out
    predictive metrics: mean absolute error, mean squared error and root mean
    squared error for continuous predictions, and accuracy and balanced accuracy for
    binary classification. (#202, @LeeviLindgren)

  • New functions crps(), scrps(), loo_crps(), and loo_scrps() for
    computing the (scaled) continuously ranked probability score. (#203, @LeeviLindgren)

  • New vignette "Mixture IS leave-one-out cross-validation for high-dimensional Bayesian models." This is a demonstration of the mixture estimators proposed by Silva and Zanella (2022). (#210)

Bug fixes

  • Minor fix to model names displayed by loo_model_weights() to make them consistent with loo_compare(). (#217)

loo v2.5.1

24 Mar 16:05
Compare
Choose a tag to compare
  • Fix R CMD check error on M1 Mac as requested by CRAN

loo v2.5.0

16 Mar 17:27
Compare
Choose a tag to compare

Improvements

  • New Frequently Asked Questions page on the package website. (#143)

  • Speed improvement from simplifying the normalization when fitting the
    generalized Pareto distribution. (#187, @sethaxen)

  • Added parallel likelihood computation to speedup loo_subsample() when using posterior approximations. (#171, @kdubovikov)

  • Switch unit tests from Travis to GitHub Actions. (#164)

Bug fixes

  • Fixed a bug causing the normalizing constant of the PSIS (log) weights not
    to get updated when performing moment matching with save_psis = TRUE (#166, @fweber144).

loo v2.4.0

05 Dec 01:44
Compare
Choose a tag to compare

Bug fixes

  • Fixed a bug in relative_eff.function() that caused an error on Windows when
    using multiple cores. (#152)

  • Fixed a potential numerical issue in loo_moment_match() with split=TRUE. (#153)

  • Fixed potential integer overflow with loo_moment_match(). (#155, @ecmerkle)

  • Fixed relative_eff() when used with a posterior::draws_array. (#161, @rok-cesnovar)

New features

  • New generic function elpd() (and methods for matrices and arrays) for
    computing expected log predictive density of new data or log predictive density
    of observed data. A new vignette demonstrates using this function when doing
    K-fold CV with rstan. (#159, @bnicenboim)

loo v2.3.1

14 Jul 22:53
Compare
Choose a tag to compare
  • Fix a bug in loo_moment_match() that prevented ... arguments from being used
    correctly. (#149)

loo v2.3.0

07 Jul 21:35
Compare
Choose a tag to compare
  • Added Topi Paananen (@topipa) and Paul Bürkner (@paul-buerkner) as coauthors.

  • New function loo_moment_match() (and new vignette), which can be used to
    update a loo object when Pareto k estimates are large. (#130)

  • The log weights provided by the importance sampling functions psis(),
    tis(), and sis() no longer have the largest log ratio subtracted from them
    when returned to the user. This should be less confusing for anyone using
    the weights() method to make an importance sampler. (#112, #146)

  • MCSE calculation is now deterministic (#116, #147)

loo v2.2.0

19 Dec 16:13
Compare
Choose a tag to compare

loo 2.2.0

See release notes below or at mc-stan.org/loo/news.

(GitHub issue/PR number in parentheses)

  • Added Mans Magnusson (@MansMeg) as a coauthor.

  • New functions loo_subsample() and loo_approximate_posterior() (and new
    vignette) for doing PSIS-LOO with large data. (#113)

  • Added support for standard importance sampling and truncated importance
    sampling (functions sis() and tis()). (#125)

  • compare() now throws a deprecation warning suggesting loo_compare(). (#93)

  • A smaller threshold is used when checking the uniqueness of tail values. (#124)

  • For WAIC, warnings are only thrown when running waic() and not when printing
    a waic object. (#117, @mcol)

  • Use markdown syntax in roxygen documentation wherever possible. (#108)

loo v2.1.0

14 Mar 04:58
Compare
Choose a tag to compare

See release notes below or at mc-stan.org/loo/news.

Installation

Install from CRAN:

install.packages("loo")

Install from GitHub:

devtools::install_github("stan-dev/loo", ref = "v2.1.0")  

Release notes

  • New function loo_compare() for model comparison that will eventually replace
    the existing compare() function. (#93)

  • New vignette on LOO for non-factorizable joint Gaussian models. (#75)

  • New vignette on "leave-future-out" cross-validation for time series models. (#90)

  • New glossary page (use help("loo-glossary")) with definitions of key terms. (#81)

  • New se_diff column in model comparison results. (#78)

  • Improved stability of psis() when log_ratios are very small. (#74)

  • Allow r_eff=NA to suppress warning when specifying r_eff is not applicable
    (i.e., draws not from MCMC). (#72)

  • Update effective sample size calculations to match RStan's version. (#85)

  • Naming of k-fold helper functions now matches scikit-learn. (#96)