It is challenging to evaluate machine learning approaches developed for accelerating materials search and discovery in a realistic way. Machine learning approaches to
materials stability prediction are typically assessed by their ability to reproduce results
from direct physical modeling, whereas ideally both machine learning and direct physical modeling should be assessed by their ability to reproduce reality. Additionally,
traditional evaluation metrics do not directly reflect the experience of an experimental
search for unknown compounds in a large candidate phase space, and often result in
overly optimistic assessments. Here, we (i) present a framework that combines density
functional theory and traditional supervised machine learning methods (ML/DFT),
and (ii) introduce the concepts of search completeness – the fraction of discoverable
compounds found relative to the fraction of search space explored – and search efficiency – the rate of discovery relative to the fraction of search space explored – to evaluate it. The ML/DFT framework is an iterative approach to predict stable
chemistries of a fixed crystal structure (here, spinels) that uses DFT to generate a
training set of unstable compounds. The training set of stable compounds is given by
experimentally known spinels. The method is carried out using random forest, LASSO,
and ridge regression to predict as-of-yet undiscovered spinel chemistries. TreeSHAP
analysis is used to determine features that most contribute to stability/instability classification. While no single feature dominates, several emerge that align with chemical
intuition. To estimate the efficacy of ML/DFT compared to pure DFT, we introduce
a Bayesian description of DFT distribution of energies for stable and unstable spinels.
The Bayesian model enables quantifying the search completeness and search efficiency
of DFT, which is then compared to that of ML/DFT. ML/DFT achieves search completeness and efficiency on par with pure DFT, despite requiring fewer DFT simulations
(∼300 vs. 14,200). More importantly, by quantitatively assessing ML approaches in
ways that better reflect how they would be used in materials discovery experiments,
we obtain key insights into the challenges that need to be overcome by such methods:
that the small number of stable compounds to be found in a search space orders of
magnitude larger places stringent demands on model accuracy to achieve good search
efficiency. Finally, we report the top candidates of our spinel search, which may be of
interest for synthesis experiments<br>