An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a “one stop shop” algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For “future” predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Material informatics may provide meaningful insights and powerful predictions for the development of new and efficient Metal Oxide (MO) based solar cells. The main objective of this paper is to establish the usefulness of data reduction and visualization methods for analyzing data sets emerging from multiple all-MOs solar cell libraries. For this purpose, two libraries, TiO |Co O and TiO |Co O |MoO , differing only by the presence of a MoO layer in the latter were analyzed with Principal Component Analysis and Self-Organizing Maps. Both analyses suggest that the addition of the MoO layer to the TiO |Co O library has affected the overall photovoltaic (PV) activity profile of the solar cells making the two libraries clearly distinguishable from one another. Furthermore, while MoO had an overall favorable effect on PV parameters, a sub-population of cells was identified which were either indifferent to its presence or even demonstrated a reduction in several parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.