Deep neural networks (DNNs) based quantitative structure–property relationship (QSPR) studies are receiving increasing attention due to their excellent performances. A systematic methodology coupling multiple machine learning technologies is proposed to systematically solve vital problems including applicability domain and prediction uncertainty in DNN‐based QSPR modeling. Key features are rapidly extracted from plentiful but chaotic descriptors by principal component analysis (PCA) and kernel PCA. Then, a detailed applicability domain (AD) is defined by K‐means algorithm to avoid unreliable predictions and discover its potential impact on prediction uncertainty. Moreover, prediction uncertainty is analyzed with dropout‐embedded DNN by thousands of independent tests to assess the reliability of predictions. The prediction of flashpoint temperature is employed as a case study, demonstrating that the model accuracy is remarkably improved comparing with the referenced model. Furthermore, the proposed methodology breaks through difficulties in analyzing the uncertainty of DNN‐based QSPRs and presents an AD correlated with the uncertainty.
Electrochemical CO2 reduction reaction (CO2RR) is being accepted as one of the most promising strategy to convert carbon emissions to valuable chemicals and fuels. Among the various types of electrocatalysts,...
Quantitative structure-property relationship (QSPR) studies based on
deep neural networks (DNN) are receiving increasing attention due to
their excellent performances. A systematic methodology coupling multiple
machine learning technologies is proposed to solve vital problems
including applicability domain and prediction uncertainty in DNN-based
QSPRs. Key features are rapidly extracted from plentiful but chaotic
descriptors by principal component analysis (PCA) and kernel PCA. Then,
a detailed applicability domain (AD) is defined by K-means algorithm to
avoid unreliable predictions and discover its potential impact on
uncertainty. Moreover, prediction uncertainty is analyzed with
dropout-embedded DNN by thousands of independent tests to assess the
reliability of predictions. The prediction of flashpoint temperature is
employed as a case study demonstrating that the model accuracy is
remarkably improved comparing with the referenced model. More
importantly, the proposed methodology breaks through difficulties in
analyzing the uncertainty of DNN-based QSPRs and presents an AD
correlated with the uncertainty.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.