2020
DOI: 10.1021/acs.jcim.9b01067
|View full text |Cite
|
Sign up to set email alerts
|

Experimental Error, Kurtosis, Activity Cliffs, and Methodology: What Limits the Predictivity of Quantitative Structure–Activity Relationship Models?

Abstract: Given a particular descriptor/method combination, some quantitative structure–activity relationship (QSAR) datasets are very predictive by random-split cross-validation while others are not. Recent literature in modelability suggests that the limiting issue for predictivity is in the data, not the QSAR methodology, and the limits are due to activity cliffs. Here, we investigate, on in-house data, the relative usefulness of experimental error, distribution of the activities, and activity cliff metrics in determ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
62
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 49 publications
(65 citation statements)
references
References 32 publications
3
62
0
Order By: Relevance
“…In general, machine learning approaches perform better with multiple experimental observations and molecular diversity. Repeated observations are useful for quantifying experimental variability in the assay and therefore the limits of predictability [1,17]. Molecular diversity in the training data allows models to generalize to a wide range of molecular scaffolds (global diversity) as well as learn nuances from smaller functional group perturbations (localized diversity).…”
Section: Measuring How Models Generalize For Medicinal Chemistrymentioning
confidence: 99%
“…In general, machine learning approaches perform better with multiple experimental observations and molecular diversity. Repeated observations are useful for quantifying experimental variability in the assay and therefore the limits of predictability [1,17]. Molecular diversity in the training data allows models to generalize to a wide range of molecular scaffolds (global diversity) as well as learn nuances from smaller functional group perturbations (localized diversity).…”
Section: Measuring How Models Generalize For Medicinal Chemistrymentioning
confidence: 99%
“…Publications with essential experimental controls reported -such as incubation time and concentration regime to demonstrate equilibrium -can add confidence to the reported affinity, however these may be performed and not reported [104]. Meta-analyses of both repeatability [105] and reproducibility [103] found errors in pKi of 0.3-0.4 log units (0.43-0.58 kcal mol -1 ) and 0.44 log units (0.64 kcal mol -1 ) respectively. Another analysis for reproducibility found that variability in pIC 50 were even 21-26% higher than for pKi data (0.55 log units) [100].…”
Section: Experimental Uncertaintymentioning
confidence: 99%
“…where R 2 max is the highest achievable R 2 for a dataset with a standard deviation of affinities (σ(affinity)) and an experimental uncertainty of σ(measurement error) [105]. This relation is illustrated in Figure 7.…”
Section: Ensuring Sufficient Statistical Powermentioning
confidence: 99%
“…This approach enabled the prediction of ACs of varying magnitude. However, as mentioned above, potency predictions for AC compounds using QSAR approaches are generally difficult, regardless of descriptors and methods used [40]. This is the case because QSAR modeling is principally based on the presence of SAR continuity when gradual changes in molecular structure cause small to moderate changes in potency.…”
Section: Prediction Of Activity Cliffsmentioning
confidence: 99%