2016
DOI: 10.3389/fmats.2016.00028
|View full text |Cite
|
Sign up to set email alerts
|

Theory-Guided Machine Learning in Materials Science

Abstract: Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
101
0
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 157 publications
(105 citation statements)
references
References 34 publications
1
101
0
3
Order By: Relevance
“…3 We ultimately emphasize that materials science is currently undergoing a "change of paradigm: from description to prediction" (Heine, 2014). Thus, we expect these tools to be useful in future machine-learning (Jain et al, 2016;Ward and Wolverton, 2017) applications as descriptors that capture much of the most basic-but essential (Wagner and Rondinelli, 2016)-information of a given material: the crystal structure.…”
Section: Resultsmentioning
confidence: 99%
“…3 We ultimately emphasize that materials science is currently undergoing a "change of paradigm: from description to prediction" (Heine, 2014). Thus, we expect these tools to be useful in future machine-learning (Jain et al, 2016;Ward and Wolverton, 2017) applications as descriptors that capture much of the most basic-but essential (Wagner and Rondinelli, 2016)-information of a given material: the crystal structure.…”
Section: Resultsmentioning
confidence: 99%
“…This down‐selection process can be automatic, e.g., using L 1 or L 0 regularization (least absolute shrinkage and selection operator, LASSO), feature importance, genetic algorithms, etc. However, a drawback of data‐driven feature selection is that the selected features do not imply causality with respect to the target and will be highly dependent on the chosen hyperparameters of the model . Yet another approach is to use dimension reduction algorithms to “synthesize” new and low‐dimensional features from the original features.…”
Section: Featurizationmentioning
confidence: 99%
“…However, a drawback of data-driven feature selection is that the selected features do not imply causality with respect to the target and will be highly dependent on the chosen hyperparameters of the model. [105] Yet another approach is to use dimension reduction algorithms to "synthesize" new and low-dimensional features from the original features. The principal component analysis (PCA) [106] is widely used in this context.…”
Section: Featurizationmentioning
confidence: 99%
“…Notably, the persistent diagram was evenly divided into 2500 bins for kernel density estimation in the present work, that is, the source data of PI possesses a dimension of 2500. In machine learning, an excess of model variables often leads to overfitting . Therefore, principal component analysis (PCA) was employed to reduce the dimension of the dataset.…”
Section: Resultsmentioning
confidence: 99%
“…In machine learning, an excess of model variables often leads to overfitting. [22] Therefore, principal component analysis (PCA) was employed to reduce the dimension of the dataset. PCA normalizes the highdimension dataset with correlated variables and converts it into a set of linearly uncorrelated vectors.…”
Section: Resultsmentioning
confidence: 99%