2019
DOI: 10.1177/1177932219871263
|View full text |Cite
|
Sign up to set email alerts
|

On the Upper Bounds of the Real-Valued Predictions

Abstract: Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
25
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(28 citation statements)
references
References 9 publications
3
25
0
Order By: Relevance
“…Some studies provided a theoretical estimation of the upper bound of the Pearson correlation as a function of the average uncertainty of the data (σ) and the standard deviation of the dataset ( see Fig. 1 C) [61] , [62] . As an example, the popular datasets S2648 and VariBench have a σ DB < 2 kcal/mol, leading to an upper bound for the Pearson correlation coefficient of ~ 0.8 and a lower bound for the root mean square error between experimental and predicted ΔΔ G values of ~ 1 kcal/mol.…”
Section: Best Practice and Pitfalls In Prediction Assessmentmentioning
confidence: 99%
“…Some studies provided a theoretical estimation of the upper bound of the Pearson correlation as a function of the average uncertainty of the data (σ) and the standard deviation of the dataset ( see Fig. 1 C) [61] , [62] . As an example, the popular datasets S2648 and VariBench have a σ DB < 2 kcal/mol, leading to an upper bound for the Pearson correlation coefficient of ~ 0.8 and a lower bound for the root mean square error between experimental and predicted ΔΔ G values of ~ 1 kcal/mol.…”
Section: Best Practice and Pitfalls In Prediction Assessmentmentioning
confidence: 99%
“…Since , the upper bound for becomes , and we refer to this upper bound as hereafter. Despite claims that no ML model could perform better than this upper bound, 15 , 16 by comparing the equations for and , it is clear that estimates are lower than both estimates as well as achieved ML model performance.…”
Section: Theoretical Analysismentioning
confidence: 97%
“…Methods to estimate this upper bound are underdeveloped, although some progress has been made recently. 15 , 16 Moreover, the resources invested into model development have diminishing returns on model performance as one approaches the upper bound. Knowing the best expected MSE or R 2 (i.e.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Regarding the results presented in Figures 3 and 4, it is worth noticing that the maximum achievable Pearson's correlation is not necessarily equal to 1, as usually thought. It may be far lower depending on the experimental uncertainty and the ∆∆G distributions [36,37]. In particular, when considering the different experiments on the same variants included in the Protherm database or in manually-cleaned datasets, the expected Pearson upper bound is in the range of 0.70-0.85 [36].…”
Section: Prediction Of the Experimental ∆∆G Valuesmentioning
confidence: 99%