2023
DOI: 10.1021/acs.jcim.3c00373
|View full text |Cite
|
Sign up to set email alerts
|

Characterizing Uncertainty in Machine Learning for Chemistry

Abstract: Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predic… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 29 publications
(14 citation statements)
references
References 84 publications
0
14
0
Order By: Relevance
“…Random splits are not always ideal for testing molecule data sets. , In order to test the generalizability of molecule representations, many approaches attempt to split molecule by molecular scaffolds . For DELs, rather than generic molecular scaffolding strategies, synthons provide a natural grouping and separation of the chemical space.…”
Section: Resultsmentioning
confidence: 99%
“…Random splits are not always ideal for testing molecule data sets. , In order to test the generalizability of molecule representations, many approaches attempt to split molecule by molecular scaffolds . For DELs, rather than generic molecular scaffolding strategies, synthons provide a natural grouping and separation of the chemical space.…”
Section: Resultsmentioning
confidence: 99%
“…Note that the performance of an ML approach is directly influenced and limited by the quality – especially the noisiness – of the reference data. 64 PBE0 is a generally robust functional that usually yields good NMR properties and especially the SO-relativistic variant has proven reliable performance in our previous studies on 29 Si 10 and 119 Sn 11 NMR chemical shifts. Furthermore, in contrast to full four-component relativistic methods, SO-ZORA-PBE0 is still feasible for the medium-sized (>40 atoms) molecules included in the data set.…”
Section: Methodsmentioning
confidence: 99%
“…The information included as input is of central importance for the quality and performance of the ML model. 64 In the case of Δ SO -ML, the input feature vector is constructed such that it contains information about the geometric (solely from the three-dimensional structure), electronic (from the converged density matrix of the DFT single-point calculation), and magnetic (from the DFT NMR shielding constant calculation) surrounding of each atom of interest. The majority of the descriptors of these categories was taken from the Δ corr -ML model and some were omitted.…”
Section: Methodsmentioning
confidence: 99%
“…There are two sources of uncertainty in machine learning for chemistry: the aleatoric uncertainty and the epistemic uncertainty. 49 The aleatoric uncertainty is associated with noise in the training data and is not reducible during training, while the epistemic uncertainty is associated with the model bias and variances and is reducible. The original SIDT algorithm 26 extends the tree until exhaustion.…”
Section: Prepruning Methodsmentioning
confidence: 99%