Discovery of new perovskite materials is motivated by a broad range of materials applications and accelerated by recent advances in machine learning (ML). We herein report dataset augmentation, benchmarking, and interrogation for an ongoing experimental campaign consisting of 9483 halide perovskite synthesis experiments. To address limitations in previous work, we developed an improved description of the reactant concentrations in the experiments (validated against experimental observations) and performed experiments quantifying the excess volume of mixing of γ-butyrolactone/formic acid mixtures used in the perovskite syntheses. Combining this improved description of reactant concentration with other physicochemical features of the reactants, we constructed 1108 ML models to elucidate the roles of the algorithm (k-nearest neighbors, linear support-vector machine, and gradient boosted tree), feature set (12 in total), preprocessing regime (e.g., standardization), and training data holdout scheme on ML predictive ability. ML comparisons illustrated that the chemical accuracy of less sophisticated physical models in a dataset do not hinder interpolative model performance. Analysis of feature contributions showed how ML models "learn" competitive representations for concentration using raw experimental descriptions. Interrogation of the most performant models indicated that the numerical values of physicochemical features were not important, rather these features were being used to identify and interpolate within a particular reactant set. ML models were shown to be capable of making rudimentary extrapolations to untrained chemical systems when compared against basic benchmarks, and models which included the newly developed chemical features were shown to be more reliable than models trained without. These results illustrate how a stepwise comparative approach to machine learning can provide insight into what and how much models are "learning" for a given prediction task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.