2020
DOI: 10.1038/s41524-020-00406-3
|View full text |Cite|
|
Sign up to set email alerts
|

Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm

Abstract: We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13 ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material’s composition and/or crystal … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
280
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 210 publications
(284 citation statements)
references
References 45 publications
4
280
0
Order By: Relevance
“…For this work, we obtained both computational and experimental materials data for benchmarking. Our benchmark data includes materials properties from the Matbench dataset as provided by Dunn et al 36 . In addition, materials properties data from a number of works 6,[54][55][56][57] are collected, which are referred to as the Extended dataset.…”
Section: Data and Materials Properties Procurementmentioning
confidence: 99%
See 1 more Smart Citation
“…For this work, we obtained both computational and experimental materials data for benchmarking. Our benchmark data includes materials properties from the Matbench dataset as provided by Dunn et al 36 . In addition, materials properties data from a number of works 6,[54][55][56][57] are collected, which are referred to as the Extended dataset.…”
Section: Data and Materials Properties Procurementmentioning
confidence: 99%
“…To address the diversity of learning challenges, in Dunn et al, the Automatminer framework uses computationally expensive searches to optimize classical modeling techniques. They demonstrate effective learning on some data, but show shortcomings when deep learning is appropriate 36 .…”
Section: Introductionmentioning
confidence: 99%
“…The random forest is an ensemble of many such trees, where the predictions of uncorrelated trees are averaged over to reduce overfitting. Ten percent of the data were held as a test set, hyperparameters were tuned through a grid search, and fivefold cross-validation was used for validation, following the conventions in Matbench and Automatminer ( 38 ).…”
Section: Resultsmentioning
confidence: 99%
“…Prediction and discovery of new materials using computation remains a longstanding challenge. [1][2][3][4][5][6][7] The challenge arises in part from the vast compositional and structural phase space in which materials live. 1,8,9 The large phase space, combined with the complex way the materials energy landscape varies with chemistry and crystal structure, 8 makes discovering a new stable compound akin to finding a needle in a haystack.…”
Section: Introductionmentioning
confidence: 99%
“…4 where A, B are elements with known oxidation states of +2, +3 respectively and C is restricted to O, S, Se, and Te. From pure elemental substitutions, this generates a set of ∼14,200 total possible compounds of which 200 are the known stable spinels.…”
mentioning
confidence: 99%