2023
DOI: 10.1039/d2dd00113f
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying the performance of machine learning models in materials discovery

Abstract: The predictive capabilities of machine learning (ML) models used in materials discovery are typically measured using simple statistics such as the root-mean-square error (RMSE) or the coefficient of determination (r2)...

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
28
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 55 publications
1
28
0
Order By: Relevance
“…If the goal is just discovery of a single new material, then knowledge of the probability of success is likely more important than an estimation of the total number of materials under consideration in the design space. These above considerations mesh well with some key conclusions drawn by Borg et al [18] when they examined sequential active learning campaigns with single fidelity data, where they found that the ML model performance for materials discovery depended strongly on the target range of the property distribution which yields success, and whether one is interested in a single discovery or many discoveries. The second aspect to keep in mind when approaching a new problem of this type is the strength of the correlation between low-and high-fidelity data.…”
Section: Additional Discussionsupporting
confidence: 82%
See 1 more Smart Citation
“…If the goal is just discovery of a single new material, then knowledge of the probability of success is likely more important than an estimation of the total number of materials under consideration in the design space. These above considerations mesh well with some key conclusions drawn by Borg et al [18] when they examined sequential active learning campaigns with single fidelity data, where they found that the ML model performance for materials discovery depended strongly on the target range of the property distribution which yields success, and whether one is interested in a single discovery or many discoveries. The second aspect to keep in mind when approaching a new problem of this type is the strength of the correlation between low-and high-fidelity data.…”
Section: Additional Discussionsupporting
confidence: 82%
“…First, the discovery yield is the fraction of materials which pass the success criterion (high-fidelity bandgap in range of 1.1-1.7 eV) compared to the total pool of considered materials (here, 4709 different compounds). Note, this is the same discovery yield metric as defined and discussed in the work by Borg et al [18]. The data acquisition ratio (a) is an input parameter for the campaign, and is the number of low-fidelity measurements performed per high-fidelity measurement in a single loop of the discovery campaign, a = #LF/#HF.…”
Section: Methodsmentioning
confidence: 99%
“…† Notably, the purely-exploratory random acquisition performs as well as MU for building minimal datasets, consistent with previous reports comparing model accuracy as a function of SL iteration using similar acquisition functions. 55 An expanded comparison of acquisition functions (including another baseline, a "space-lling" strategy, in addition to random search) for the three SL-related tasks of nding optimal candidates, surfacing high-quality candidates, and building minimal datasets for training ML surrogates, can be found in the ESI. †…”
Section: Surrogatization Of Compute-intensive Simulationsmentioning
confidence: 99%
“…Additional metrics such as "discovery" scores are being developed to evaluate a model's capability to propose new high-performing materials. 59 Further details on supervised and unsupervised learning models, model training, validation, evaluation, and interpretation can be found in a user-guide to machine learning for materials design by Gormley and coworkers. 16 Other helpful resources include the software QSARINS, which focuses on multiple linear regression modeling and includes tools for data preprocessing, validation, outlier detection, and visualization 60 and polyBERT, an endto-end machine learning pipeline for polymer informatics and optimization.…”
Section: Modeling and Leveraging Screening Outputs: How Can This Libr...mentioning
confidence: 99%
“…In the first step of model development, many different models are fit to the same data set to determine the best performance. , Model performance is traditionally quantified through prediction error such as root-mean-square error (RMSE), where 0 is theoretical perfect performance in a noise-free data set. Additional metrics such as “discovery” scores are being developed to evaluate a model’s capability to propose new high-performing materials . Further details on supervised and unsupervised learning models, model training, validation, evaluation, and interpretation can be found in a user-guide to machine learning for materials design by Gormley and co-workers .…”
Section: Workflow Design To Unveil Structure–property Relationships I...mentioning
confidence: 99%