2018
DOI: 10.1038/s41524-018-0081-z
|View full text |Cite
|
Sign up to set email alerts
|

A strategy to apply machine learning to small datasets in materials science

Abstract: There is growing interest in applying machine learning techniques in the research of materials science. However, although it is recognized that materials datasets are typically smaller and sometimes more diverse compared to other fields, the influence of availability of materials data on training machine learning models has not yet been studied, which prevents the possibility to establish accurate predictive rules using small materials datasets. Here we analyzed the fundamental interplay between the availabili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

9
386
4
4

Year Published

2018
2018
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 563 publications
(403 citation statements)
references
References 63 publications
9
386
4
4
Order By: Relevance
“…For example, using the mean and standard deviation of elemental descriptors and BOP, the best model developed on 110 compounds for log-scaled κ l gives RMSE of 0.096 [6]. In another model developed on 93 compounds, the difference between experimental and predicted values of κ l has been reported to lie within a factor of 1.5-2 [52]. Using the descriptors related directly to the physics of κ l , a prediction model developed on 120 compounds gives RMSE of 0.21 [22].…”
Section: Resultsmentioning
confidence: 99%
“…For example, using the mean and standard deviation of elemental descriptors and BOP, the best model developed on 110 compounds for log-scaled κ l gives RMSE of 0.096 [6]. In another model developed on 93 compounds, the difference between experimental and predicted values of κ l has been reported to lie within a factor of 1.5-2 [52]. Using the descriptors related directly to the physics of κ l , a prediction model developed on 120 compounds gives RMSE of 0.21 [22].…”
Section: Resultsmentioning
confidence: 99%
“…When both α 1 and α 2 are nonzero, the elastic net model is obtained. The ridge and LASSO regression models have found wide applications in materials ML to solve the small‐data problem, compute phonons using compressive sensing, perform feature selection, and avoid overfitting …”
Section: Model Selection and Trainingmentioning
confidence: 99%
“…18,21 In the absence of an analytic model or high-throughput defect calculations, statistical learning from experimental or computational data can serve as an alternative to create empirical models and rules of thumb to make predictions of dopability in new compounds. Machine learning has proven successful in understanding and predicting energy and entropy, 29 potentials and forces, [30][31][32] structure, physical, and elastic properties, [33][34][35][36][37][38] bandgap, 34,39,40 and defects, 41 as well as enabling high-throughput screening and discovery, [42][43][44][45][46] and guiding experimental synthesis. 47,48 In order to properly model and interpret dopability, the construction of an empirical dataset for cross-validation is of vital importance.…”
Section: Introductionmentioning
confidence: 99%