2023
DOI: 10.1021/acs.jcim.2c01321
|View full text |Cite
|
Sign up to set email alerts
|

Predict Ionization Energy of Molecules Using Conventional and Graph-Based Machine Learning Models

Abstract: Ionization energy (IE) is an important property of molecules. It is highly desirable to predict IE efficiently based on, for example, machine learning (ML)-powered quantitative structure-property relationships (QSPR). In this study, we systematically compare the performance of different machine learning models in predicting the IE of molecules with distinct functional groups obtained from the NIST webbook. Mordred and PaDEL are used to generate informative and computationally inexpensive descriptors for conven… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 45 publications
0
9
0
Order By: Relevance
“…In fact, several new models for compound-kinase binding prediction are introduced every month [CCAS + 15, CRA + 21, DQJ + 22, DSSGP22]. They differ in the learning algorithm used, such as simple k-nearest neighbor regression [BHS + 21], decision trees [TAA + 22], kernel learning [MM12, NPC16, CRP + 17, CPS + 18] and deep learning methods [BHS + 21, O18, KZEK23, LLP23, SSB + 23], as well as compound and protein descriptors, including compound SMILES and graphs [DTME20], protein amino acid sequences [BHS + 21, KZEK23] and, lately, more complex 3D structure-based features [KZK + 23, PHL + 23, LKN + 23, LTZ + 23] and embeddings from pretrained large language models [SSB + 23]. Most recent methods modeling compound-kinase activities learn from the descriptors of both compounds and kinases, and are referred to as proteochemometric models.…”
Section: Introductionmentioning
confidence: 99%
“…In fact, several new models for compound-kinase binding prediction are introduced every month [CCAS + 15, CRA + 21, DQJ + 22, DSSGP22]. They differ in the learning algorithm used, such as simple k-nearest neighbor regression [BHS + 21], decision trees [TAA + 22], kernel learning [MM12, NPC16, CRP + 17, CPS + 18] and deep learning methods [BHS + 21, O18, KZEK23, LLP23, SSB + 23], as well as compound and protein descriptors, including compound SMILES and graphs [DTME20], protein amino acid sequences [BHS + 21, KZEK23] and, lately, more complex 3D structure-based features [KZK + 23, PHL + 23, LKN + 23, LTZ + 23] and embeddings from pretrained large language models [SSB + 23]. Most recent methods modeling compound-kinase activities learn from the descriptors of both compounds and kinases, and are referred to as proteochemometric models.…”
Section: Introductionmentioning
confidence: 99%
“…However, such explicit knowledge about ligand pose in the binding pocket may be unnecessary for predicting ligand binding scores. Indeed, several methods have been proposed that predict activity given just the ligand chemical graph representation and the 3D receptor structure or the ligand graph and the receptor amino acid sequence. Additionally, it is possible to train models that rely solely on docked poses, as is done by Liu et al…”
Section: Introductionmentioning
confidence: 99%
“…Indeed, several methods have been proposed that predict activity given just the ligand chemical graph representation and the 3D receptor structure 20 or the ligand graph and the receptor amino acid sequence. 21−23 Additionally, it is possible to train models that rely solely on docked poses, as is done by Liu et al 24 As mentioned above, deep learning has proven effective when using much more data than the 20K activity data points available in PDBbind or CrossDocked. Thus, we hypothesized that an expanded data set with orders of magnitude more binding data would result in more accurate models for predicting binders to novel proteins.…”
Section: ■ Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, several methods have been proposed that predict activity given just the ligand chemical graph representation and the 3D receptor structure 20 or the ligand graph and the receptor amino acid sequence. [21][22][23] Additionally, it is possible to train models that rely solely on docked poses, as is done by Liu et al 24 .…”
Section: Introductionmentioning
confidence: 99%