Chemical databases are an essential tool for data-driven investigation of structure–property relationships and for the design of novel functional compounds. We introduce the first phase of the COMPAS Projecta COMputational database of Polycyclic Aromatic Systems. In this phase, we developed two data sets containing the optimized ground-state structures and a selection of molecular properties of ∼34k and ∼9k cata-condensed polybenzenoid hydrocarbons (at the GFN2-xTB and B3LYP-D3BJ/def2-SVP levels, respectively) and placed them in the public domain. Herein, we describe the process of the data set generation, detail the information available within the data sets, and show the fundamental features of the generated data. We analyze the correlation between the two types of computations as well as the structure–property relationships of the calculated species. The data and insights gained from them can inform rational design of novel functional aromatic molecules for use in, e.g., organic electronics, and can provide a basis for additional data-driven machine- and deep-learning studies in chemistry.
Polycyclic aromatic systems are prevalent in chemistry and materials science because their thermodynamic stability, planarity, and tunable electronic properties make them uniquely suited for various uses. These properties are closely linked to the aromaticity of the systems. Therefore, characterizing the aromatic behavior is useful for designing new functional compounds and understanding their reactivity. NICS-XY-scans are a popular and simple tool for investigating the aromatic trends in polycyclic systems. Herein we present Predi-XY: an automated system for generating NICS-XY-scans for polycyclic aromatic systems using an additivity scheme. The program provides the predicted scans at a fraction of the computational cost of a full quantum mechanical calculation and enables rapid comparison of various polycyclic aromatic systems.
In this work, interpretable deep learning was used to identify structure−property relationships governing the HOMO− LUMO gap and the relative stability of polybenzenoid hydrocarbons (PBHs) using a ring-based graph representation. This representation was combined with a subunit-based perception of PBHs, allowing chemical insights to be presented in terms of intuitive and simple structural motifs. The resulting insights agree with conventional organic chemistry knowledge and electronic structure-based analyses and also reveal new behaviors and identify influential structural motifs. In particular, we evaluated and compared the effects of linear, angular, and branching motifs on these two molecular properties and explored the role of dispersion in mitigating the torsional strain inherent in nonplanar PBHs. Hence, the observed regularities and the proposed analysis contribute to a deeper understanding of the behavior of PBHs and form the foundation for design strategies for new functional PBHs.
New tools are developed and applied to enable the use of interpretable machine learning for investigation of structure–property relationships in polybenzenoid hydrocarbons (PBHs). A textual molecular representation, which is based on the annulation sequence of PBHs, is shown to be of utility either in its textual form or as a basis for a curated feature vector. Both forms display interpretability exceeding those achievable by standard SMILES representation; and the former also has increased predictive accuracy. A recently developed model, CUSTODI, was applied for the first time as an interpretable model, identifying important structural features that impact various electronic molecular properties. The resulting insights not only validate several well‐known “rules of thumb” of organic chemistry but also reveal new behaviors and influential structural motifs, thus providing guiding principles for rational design and fine‐tuning of PBHs.
In this work, interpretable deep learning was used to identify structure-property relationships governing the HOMO-LUMO gap and relative stability of polybenzenoid hydrocarbons (PBHs). To this end, a ring-based graph representation was used. In addition to affording reduced training times and excellent predictive ability, this representation could be combined with a subunit-based perception of PBHs, allowing chemical insights to be presented in terms of intuitive and simple structural motifs. The resulting insights agree with conventional organic chemistry knowledge and electronic structure-based analyses, and also reveal new behaviors and identify influential structural motifs. In particular, we evaluated and compared the effects of linear, angular, and branching motifs on these two molecular properties, as well as explored the role of dispersion in mitigating torsional strain inherent in non-planar PBHs. Hence, the observed regularities and the proposed analysis contribute to a deeper understanding of the behavior of PBHs and form the foundation for design strategies for new functional PBHs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.