2020
DOI: 10.1063/5.0006204
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic performance estimators for computational chemistry methods: Systematic improvement probability and ranking probability matrix. II. Applications

Abstract: In the first part of this study (Paper I), we introduced the systematic improvement probability (SIP) as a tool to assess the level of improvement on absolute errors to be expected when switching between two computational chemistry methods. We developed also two indicators based on robust statistics to address the uncertainty of ranking in computational chemistry benchmarks: P inv , the inversion probability between two values of a statistic, and P r , the ranking probability matrix. In this second part, these… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 31 publications
2
13
0
Order By: Relevance
“…In some instances, prediction errors due to model inadequacy can be handled by statistical correction of predictions, which may provide a reliable uncertainty measure [20]. Various surrogate methods have been developed for the estimation of prediction uncertainty, such as bootstrap-based methods, Gaussian process regression, neural networks and deep learning ensembles [2123]. Gaussian process regression has been employed to identify particular calculations within a given dataset for which the uncertainties exceed a given threshold [24,25].…”
Section: Introductionmentioning
confidence: 99%
“…In some instances, prediction errors due to model inadequacy can be handled by statistical correction of predictions, which may provide a reliable uncertainty measure [20]. Various surrogate methods have been developed for the estimation of prediction uncertainty, such as bootstrap-based methods, Gaussian process regression, neural networks and deep learning ensembles [2123]. Gaussian process regression has been employed to identify particular calculations within a given dataset for which the uncertainties exceed a given threshold [24,25].…”
Section: Introductionmentioning
confidence: 99%
“…This lack of correlation supports the main message of this work: The number of fitted parameters does not represent an effective measure of the transferability of a functional. More reliable statistical criteria—such as those developed in this work, or alternatively, the probabilistic performance estimator recently introduced by Pernot and Savin [ 91,92 ] —should be used to evaluate the reliability of new and existing xc functionals.…”
Section: Statistical Criteria Of Bias–variance Tradeoff and Analysis mentioning
confidence: 99%
“…We can relate this to an increasing trend of the errors with the bandgap value. 4 In the case of PER2018, the error distributions present also large skewness and kurtosis, that can be associated with the chemical heterogeneity of the dataset. 2 For THA2015, it was noted previously 35,4 that some experimental reference data with large measurement uncertainty could not be reproduced by any method in the studied set.…”
Section: Application Of G M Cf To Rankingmentioning
confidence: 99%
“…As a benchmarking statistic, the popular mean unsigned error (MUE) bears no information on such a risk. [1][2][3][4] We have recently reported a case where two unbiased error distributions with identical values of the MUE present widely different risks of large errors because of heavy tails in one of them. 5,4 It would therefore be very useful to complement the MUE with a statistic indicating or quantifying the risk of large errors.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation