2018
DOI: 10.1039/c8me00012c
|View full text |Cite
|
Sign up to set email alerts
|

Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery

Abstract: Traditional machine learning (ML) metrics overestimate model performance for materials discovery. We introduce (1) leave-onecluster-out cross-validation (LOCO CV) and (2) a simple nearestneighbor benchmark to show that model performance in discovery applications strongly depends on the problem, data sampling, and extrapolation. Our results suggest that ML-guided iterative experimentation may outperform standard high-throughput screening for discovering breakthrough materials like high-T c superconductors with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
196
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 211 publications
(199 citation statements)
references
References 32 publications
3
196
0
Order By: Relevance
“…However, if only one group of superconductors was used as training data, the models failed to predict other groups, as shown in Figure 17b–e. This finding is consistent with a later study by Meredig et al, where the authors noted that the superconductivity data formed distinct groups and the error of the models on a given group would increase sharply if no training data from this group was included in the model. The conclusion seems to suggest that different mechanisms are at play for different superconductor groups and in fact is known in the community.…”
Section: Applicationsupporting
confidence: 91%
See 1 more Smart Citation
“…However, if only one group of superconductors was used as training data, the models failed to predict other groups, as shown in Figure 17b–e. This finding is consistent with a later study by Meredig et al, where the authors noted that the superconductivity data formed distinct groups and the error of the models on a given group would increase sharply if no training data from this group was included in the model. The conclusion seems to suggest that different mechanisms are at play for different superconductor groups and in fact is known in the community.…”
Section: Applicationsupporting
confidence: 91%
“…Many materials problems involve the search for materials that possess rare combinations of properties or extraordinary properties, while such materials may not exist in the available data space. Hence a highly accurate ML model model trained on K ‐fold CV may not generalize well to novel material classes …”
Section: Model Selection and Trainingmentioning
confidence: 99%
“…Different methods ranging from a simple holdout, over k-fold cross-validation, leave-one-out cross-validation, Monte Carlo crossvalidation, 72 up to leave-one-cluster-out cross-validation 73 can be used for the evaluation. All these methods rely on keeping some data hidden from the model during the training process.…”
Section: Fig 1 Supervised Learning Workflowmentioning
confidence: 99%
“…While this can be advantageous, it also means that a sample is not guaranteed to be in the test/training set. Leave-onecluster-out cross-validation 73 was specifically developed for materials science and estimates the ability of the machine learning model to extrapolate to novel groups of materials that were not present in the training data. Depending on the target quantity, this allows for a more realistic evaluation and a better understanding of the limitations of the machine learning model.…”
Section: Fig 1 Supervised Learning Workflowmentioning
confidence: 99%
“…In recent years, machine learning (ML) property prediction models trained on first-principles simulation data have further accelerated this discovery process 4,[11][12][13][14][15][16][17][18] throughout chemistry, [19][20][21][22][23][24] including for catalysis 15,16,25,26 and materials. 4,[27][28][29][30][31][32][33][34] Unique challenges arise in applying these tools to the discovery of open shell transition metal complexes, despite their importance as selective catalysts [35][36][37][38][39][40][41][42][43] and functional materials (e.g., molecular switches or sensors [44][45][46][47][48][49][50][51][52] ). The theoretical chemical space of inorganic complexes is diverse and relatively unexplored due to the variable spin states, oxidation states, and coordination num...…”
Section: Introductionmentioning
confidence: 99%