2022
DOI: 10.1002/poc.4458
|View full text |Cite
|
Sign up to set email alerts
|

Text‐based representations with interpretable machine learning reveal structure–property relationships of polybenzenoid hydrocarbons

Abstract: New tools are developed and applied to enable the use of interpretable machine learning for investigation of structure–property relationships in polybenzenoid hydrocarbons (PBHs). A textual molecular representation, which is based on the annulation sequence of PBHs, is shown to be of utility either in its textual form or as a basis for a curated feature vector. Both forms display interpretability exceeding those achievable by standard SMILES representation; and the former also has increased predictive accuracy… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 62 publications
0
8
0
Order By: Relevance
“…We recently utilized this to formulate a text-based representation that can be used with interpretable machine learning models. 38 In this work, we combined this subunit-based approach with the graph-of-rings (GOR) representation (see the Molecular Representation section in Computational Methods for further details) to extract further chemical insight using interpretable deep-learning models. Such graph representations were successfully employed previously in the investigation of PBHs.…”
Section: ■ Results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We recently utilized this to formulate a text-based representation that can be used with interpretable machine learning models. 38 In this work, we combined this subunit-based approach with the graph-of-rings (GOR) representation (see the Molecular Representation section in Computational Methods for further details) to extract further chemical insight using interpretable deep-learning models. Such graph representations were successfully employed previously in the investigation of PBHs.…”
Section: ■ Results and Discussionmentioning
confidence: 99%
“…One of the main conclusions from this body of work was that many of the characteristics of cata -condensed PBHs are encoded in the sequence of tricyclic annulations, namely, linear or angular annulations. We recently utilized this to formulate a text-based representation that can be used with interpretable machine learning models . In this work, we combined this subunit-based approach with the graph-of-rings (GOR) representation (see the Molecular Representation section in Computational Methods for further details) to extract further chemical insight using interpretable deep-learning models.…”
Section: Resultsmentioning
confidence: 99%
“…Certain relationships have already been uncovered. For example, the molecular electronic properties of cata ‐condensed PBHs are mostly determined by their longest linearly annulated subsection, [7, 8] while the properties of peri ‐condensed PBHs are determined by their size and edge‐attributes [9] . Indeed, a correlation has been found between the presence of zig‐zag edges, low band‐gap, and multiple spin‐states [10] .…”
Section: Figurementioning
confidence: 99%
“…The first installment, COMPAS-1, 60 focuses on groundstate cata-condensed Polybenzenoid Hydrocarbons; the second installment, COMPAS-2, 61 focuses on ground-state cata-condensed heterocyclic PASs. COMPAS-1 and COMPAS-2 have already been used to provide the first examples of interpretable machine and deeplearning models for PASs 62,63 and to demonstrate the first generative design of PASs with targeted properties. 64 Both data, as well as all future installments, are freely available for use, according to the FAIR 65 principles of data sharing.…”
Section: Introductionmentioning
confidence: 99%