2022
DOI: 10.48550/arxiv.2205.05633
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improved decision making with similarity based machine learning

Abstract: Despite their fundamental importance for science and society at large, experimental design decisions are often plagued by extreme data scarcity which severely hampers the use of modern ready-made machine learning models as they rely heavily on the paradigm, 'the bigger the data the better'. Presenting similarity based machine learning we show how to reduce these data needs such that decision making can be objectively improved in certain problem classes. After introducing similarity machine learning for the har… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…Lowest lying conformer structures were sampled using the CREST 55 algorithm, using the GFN2-xTB/GFN-FF composite method in a meta-dynamics based sampling scheme, with a nal relaxation at the GFN2-xTB level. Adding all successfully generated structures to QM9, a total pool size of 56…”
Section: Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Lowest lying conformer structures were sampled using the CREST 55 algorithm, using the GFN2-xTB/GFN-FF composite method in a meta-dynamics based sampling scheme, with a nal relaxation at the GFN2-xTB level. Adding all successfully generated structures to QM9, a total pool size of 56…”
Section: Datamentioning
confidence: 99%
“…To analyse the inuence of noise and number of candidates on the elucidation success, we applied Gaussian noise to 13 Note that increasing the maximum candidate pool size N QM9 leads to an offset of the trend towards less permissible errors. A possible explanation is the correlation of the density of chemical space with increasing numbers of candidate spectra N. 56 As shi predictions need to become more accurate, limiting N through prior knowledge of the chemical space could be benecial. Similar ndings have been reported by Sridharan et al, 41 noting that brute force enumerations of chemical space lead to worse rankings than constrained graph generation.…”
Section: Spectra Matching Accuracy With Synthetic Noisementioning
confidence: 99%
“…The important thing is that the ML model leads to samples in the right neighborhood. One framework is similarity-based kernel learning, in which one can define a cost function associated with acquiring a desired (but difficult) data point versus several similar (but more easily acquired) data points, and then use a model trained on the local environment to infer the desired point . The ease of acquisition can be computed by combining materials, labor, and time constraints .…”
Section: Recommendations Toward ML For Exceptional Materialsmentioning
confidence: 99%
“…Materials Advances decision making within experimental design problems. 81 Other future work could also involve the use of more sophisticated unsupervised ML methods to find new and potentially better classification rules, based on more complex combinations of functional groups, or other molecular features. It is not obvious to us if it is generally possible to identify advantageous structural features (leading to similar improvements in QML model accuracy) for any arbitrary property, or if our findings are rather restricted to an exclusive list of observables.…”
Section: Papermentioning
confidence: 99%