Three recommendations for evaluating climate predictions

Fricker, Thomas E.; Ferro, Christopher A. T.; Stephenson, David B.

doi:10.1002/met.1409

Cited by 38 publications

(40 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(3) is not applicable for uncertain observations. This is the same for the suggestion of Fricker et al (2013), to not only calculate and verify predictions of perennial averages but also to conduct temporal pooling of the hindcasts for a particular period. This would be particularly useful from a climate impact perspective, as it is often a single winter exhibiting extreme frequency of intense cyclones, associated with large economic losses.…”

Section: Summary and Discussionmentioning

confidence: 84%

“…Thus, in line with the argument of Fricker et al (2013), it would be of great value to verify the predictions of these shorter time-scale predictands, but the development of an alternative estimator of an unbiased RPS Á applicable for any kind of observation not necessarily constituting the Heaviside step function Á is beyond the scope of this study. However, such a development would be crucial for a fair assessment of any kind of probabilistic forecast skill based on the Brier score (BS), the RPS, or the continuous ranked probability score (CRPS).…”

Section: Summary and Discussionmentioning

confidence: 92%

See 1 more Smart Citation

Evaluating decadal predictions of northern hemispheric cyclone frequencies

Kruschke¹,

Rust²,

Kadow³

et al. 2014

Tellus A: Dynamic Meteorology and Oceanography

View full text Add to dashboard Cite

A B S T R A C T Mid-latitudinal cyclones are a key factor for understanding regional anomalies in primary meteorological parameters such as temperature or precipitation. Extreme cyclones can produce notable impacts on human society and economy, for example, by causing enormous economic losses through wind damage. Based on 41 annually initialised (1961Á2001) hindcast ensembles, this study evaluates the ability of a single-model decadal forecast system (MPI-ESM-LR) to provide skilful probabilistic three-category forecasts (enhanced, normal or decreased) of winter (ONDJFM) extra-tropical cyclone frequency over the Northern Hemisphere with lead times from 1 yr up to a decade. It is shown that these predictions exhibit some significant skill, mainly for lead times of 2Á5 yr, especially over the North Atlantic and Pacific. Skill for intense cyclones is generally higher than for all detected systems. A comparison of decadal hindcasts from two different initialisation techniques indicates that initialising from reanalysis fields yields slightly better results for the first forecast winter (month 10Á15), while initialisation based on an assimilation experiment provides better skill for lead times between 2 and 5 yr. The reasons and mechanisms behind this predictive skill are subject to future work. Preliminary analyses suggest a strong relationship of the model's skill over the North Atlantic with the ability to predict upper ocean temperatures modulating lower troposphere baroclinicity for the respective area and time scales.

show abstract

Section: Summary and Discussionmentioning

confidence: 84%

Section: Summary and Discussionmentioning

confidence: 92%

Evaluating decadal predictions of northern hemispheric cyclone frequencies

Kruschke¹,

Rust²,

Kadow³

et al. 2014

Tellus A: Dynamic Meteorology and Oceanography

View full text Add to dashboard Cite

show abstract

“…While the importance of using proper scores is well recognised (Bröcker and Smith 2007;Fricker et al 2013), researchers often face requests to present results under a variety of scores. Indeed in the context of meteorological forecast evaluation there are several recommendations in the literature (Nurmi 2003;Randall et al 2007;World Meteorological Organization 2008;Fricker et al 2013;Goddard et al 2013), although often with little discussion of which attributes different scores aim to quantify, or their strengths and weaknesses in a particular forecast setting. By convention, a lower score is taken to reflect a better forecast.…”

Section: Measuring Forecast Performancementioning

confidence: 99%

“…It is useful to speak of the "True" distribution from which the outcome is drawn (hereafter, Q) without assuming that such a distribution exists in all cases of interest. Given a proper score, a forecast system providing Q will be preferred whenever it is included amongst those under consideration (Bröcker and Smith 2007;Fricker et al 2013). When this is not the case, then even proper scores may rank two forecast systems differently, making it difficult to provide definitive statements about forecast quality.…”

Section: Measuring Forecast Performancementioning

confidence: 99%

Towards improving the framework for probabilistic forecast evaluation

et al. 2015

View full text Add to dashboard Cite

The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good's logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average "distance" between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.

show abstract

“…We may also wish to describe the performance of a set of predictions in terms of an aggregated measure such as a correlation coefficient or reliability statistic (e.g. Ferro and Fricker 2012;Fricker et al 2013). Some decision makers may prefer predictions of performance to be expressed qualitatively, as in 'the error of this climate prediction will probably be small', while others may prefer quantitative predictions, as in 'the error of this climate prediction will be less than 1…”

Section: Judging Credibilitymentioning

confidence: 99%

On judging the credibility of climate predictions

et al. 2013

Self Cite

View full text Add to dashboard Cite

Incorporating a prediction into future planning and decision making is advisable only if we have judged the prediction's credibility. This is notoriously difficult and controversial in the case of predictions of future climate. By reviewing epistemic arguments about climate model performance, we discuss how to make and justify judgments about the credibility of climate predictions. We propose a new bounding argument that justifies basing such judgments on the past performance of possibly dissimilar prediction problems. This encourages a more explicit use of data in making quantitative judgments about the credibility of future climate predictions, and in training users of climate predictions to become better judges of credibility. We illustrate the approach using decadal predictions of annual mean, global mean surface air temperature.

show abstract

Three recommendations for evaluating climate predictions

Cited by 38 publications

References 52 publications

Evaluating decadal predictions of northern hemispheric cyclone frequencies

Evaluating decadal predictions of northern hemispheric cyclone frequencies

Towards improving the framework for probabilistic forecast evaluation

On judging the credibility of climate predictions

Contact Info

Product

Resources

About