2021
DOI: 10.1021/acs.jpca.1c05031
|View full text |Cite
|
Sign up to set email alerts
|

DFT-Machine Learning Approach for Accurate Prediction of pKa

Abstract: In this study, we propose a novel method of pK a prediction in a diverse set of acids, which combines density functional theory (DFT) method with machine learning (ML) methods. First, the DFT method with B3LYP/6-31++G**/SM8 is used to predict pK a, yielding a mean absolute error of 1.85 pK a units. Subsequently, such pK a values predicted from the DFT method are employed as one of 10 molecular descriptors for developing ML models trained on experimental data. Kernel Ridge Regression (KRR), Gaussian Process Reg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 73 publications
0
19
0
Order By: Relevance
“…To obtain a more accurate estimate of p K a , these DFT predictions of p K a were corrected using the machine-learning algorithm described by Lawler et al Specifically, the DFT-calculated p K a was used as a descriptor in a kernel ridge regression (KRR) algorithm (more details are given in the Supporting Information). The algorithm was trained against the experimental p K a data listed in Tables S6 and S7.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To obtain a more accurate estimate of p K a , these DFT predictions of p K a were corrected using the machine-learning algorithm described by Lawler et al Specifically, the DFT-calculated p K a was used as a descriptor in a kernel ridge regression (KRR) algorithm (more details are given in the Supporting Information). The algorithm was trained against the experimental p K a data listed in Tables S6 and S7.…”
Section: Methodsmentioning
confidence: 99%
“…Besides the DFT-predicted p K a [marked as (6) in Tables S6–S8], five other descriptors (selected through recursive feature elimination) were used in the algorithm: (1) electronegativity of the acid’s central atom, (2) number of aromatic rings, (3) number of hydrogen atoms in the deprotonated structure, (4) molecular weight, and (5) solvation free energy of the protonated structure. We trained two separate KRR models, one for alcohols and amines, which is shown in Table S6, and another for stronger acids (e.g., carboxylic and sulfur-based), where we used training data for sulfur-based and carboxylic acids from the previous work, modifying those to include descriptors relevant to this model. These training data points are included in Table S7. …”
Section: Methodsmentioning
confidence: 99%
“…Although, when using conceptual DFT (combining molecular descriptors with the DFT results) [ 431 ] in a machine learning model, predictions were improved and allowed for extension of the technique to be used for the prediction of non-acidic compounds as well. This approach overall lowered the errors to ~1.85 pKa units [ 432 ]. ADME predictions can also be made by utilising global reactivity descriptors, such as the Fukui Functions.…”
Section: Qm/mm and Dft Approachesmentioning
confidence: 99%
“…The recent development of machine learning (ML) methods bears promise for QSPR modeling through accurate and efficient property predictions of chemical compounds. Compared with conventional simulation methods like MD or DFT, ML methods have demonstrated similar accuracy but with less computational cost in various chemical applications. Especially, several works have explored applying ML models to solving ionic liquid problems via various descriptors of IL molecules. Group contribution theory (GC) is one of the earliest descriptors for IL molecules. GC manually breaks down a molecule into different characteristic functional groups and counts the existence frequency of each group.…”
Section: Introductionmentioning
confidence: 99%