<p>A priori knowledge of melting and boiling could expedite the discovery of pharmaceutical, energetic, and energy harvesting materials. The tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. A fundamental part of data-driven modeling is molecular featurization. Herein, we propose a molecular representation with group-constitutive and geometrical descriptors that map to enthalpy and entropy--two thermodynamic quantities that drive thermal phase transitions. The descriptors are inspired by the linear regression-based quantitative structure-property relationship of Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). Combined with nonlinear machine learning (specifically, eXtreme Gradient Boosting or XGBoost), these concise and easy-to-compute descriptors provide an appealing framework for predicting transition enthalpies, entropies, and temperatures in a diverse chemical space. An application to energetic materials shows that UPPER plus XGBoost is predictive, despite a relatively modest energetics reference dataset. We also report results on public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergstrom). The newly proposed representation is determined purely from SMILES string, thus showing promise toward fast and accurate screening of thermodynamic properties.</p>
A priori knowledge of melting and boiling could expedite the discovery of pharmaceutical, energetic, and energy harvesting materials. The tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. A fundamental part of data-driven modeling is molecular featurization. Herein, we propose a molecular representation with group-constitutive and geometrical descriptors that map to enthalpy and entropy–two thermodynamic quantities that drive phase transitions. The descriptors are inspired by the linear regression-based quantitative structure-property relationship of Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). Used as input to nonlinear machine learning (specifically, eXtreme Gradient Boosting or XGBoost), these concise and easy-to-compute descriptors provide an appealing framework for predicting transition enthalpies, entropies, and temperatures in a diverse chemical space. An application to energetic materials shows that UPPER plus XGBoost is predictive, despite a relatively modest energetics reference dataset.
<p>A priori knowledge of melting and boiling could expedite the discovery of pharmaceutical, energetic, and energy harvesting materials. The tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. A fundamental part of data-driven modeling is molecular featurization. Herein, we propose a molecular representation with group-constitutive and geometrical descriptors that map to enthalpy and entropy--two thermodynamic quantities that drive thermal phase transitions. The descriptors are inspired by the linear regression-based quantitative structure-property relationship of Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). Combined with nonlinear machine learning (specifically, eXtreme Gradient Boosting or XGBoost), these concise and easy-to-compute descriptors provide an appealing framework for predicting transition enthalpies, entropies, and temperatures in a diverse chemical space. An application to energetic materials shows that UPPER plus XGBoost is predictive, despite a relatively modest energetics reference dataset. We also report results on public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergstrom). The newly proposed representation is determined purely from SMILES string, thus showing promise toward fast and accurate screening of thermodynamic properties.</p>
<p>A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. </p><p>As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about <i>sp<sup>2</sup></i>-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergstrom) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.