2006
DOI: 10.1021/ci050413p
|View full text |Cite
|
Sign up to set email alerts
|

QSAR − How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets

Abstract: The quality of QSAR (Quantitative Structure-Activity Relationships) predictions depends on a large number of factors including the descriptor set, the statistical method, and the data sets used. Here we study the quality of QSAR predictions mainly as a function of the data set and descriptor type using partial least squares as the statistical modeling method. The study makes use of the fact that we have access to a large number of data sets and to a variety of different QSAR descriptors. The main conclusions a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
141
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 163 publications
(142 citation statements)
references
References 26 publications
1
141
0
Order By: Relevance
“…The above finding is somewhat worrying given the large dataset employed here to train the model ($7000), the use of a neural network method according to best practice as defined by the software vendor, as well as commonly used descriptor types [29,30]. This result suggests great care should be taken when training QSAR models, especially on relatively small datasets using complex, non-linear statistical methods, coupled with large, often sparsely populated descriptor sets.…”
Section: Relationship Between the Distance To Model And The Predictiomentioning
confidence: 98%
“…The above finding is somewhat worrying given the large dataset employed here to train the model ($7000), the use of a neural network method according to best practice as defined by the software vendor, as well as commonly used descriptor types [29,30]. This result suggests great care should be taken when training QSAR models, especially on relatively small datasets using complex, non-linear statistical methods, coupled with large, often sparsely populated descriptor sets.…”
Section: Relationship Between the Distance To Model And The Predictiomentioning
confidence: 98%
“…These range from the so-called rules of thumb (e.g., rule-of-5, polar surface area) to quantitative prediction approaches: QSAR and quantitative structure-property relationship (QSPR) up to classification models, and similarity searches, molecular modeling (structure-based approaches such as ligand-protein docking, pharmacophore modeling, substructure, quantum mechanics) and physiologically based pharmacokinetic (PBPK) modeling [74,75,[81][82][83] (Figures 2.9-2.11). While PBPK models have received a lot of attention because they may provide valuable information on how various factors influence PK, they are not be discussed because these methods usually need experimental data and cannot be developed solely from the molecular structures of the compounds (see for instance Ref.…”
Section: 23mentioning
confidence: 99%
“…Figure 1B shows that one could obtain an accurate solubility model (r 2 = 0.85, MAE = 0.61) if one were to combine the outcome of a coarse solubility assay that could only tell whether a compound is soluble (< 10 −4 mol/L) or not (> 10 −2 mol/L) with much fewer quantitive solubility data. We use a standard dataset of the solubility of 1144 organic molecules [27], and describe the molecule by concatenating the Avalon Fingerprint [28], the MACCS Fingerprint [29], and the 1024-bit Morgan6 Fingerprint [24]. Our result compares favourably with other models that also use binary molecular fingerprints, e.g.…”
mentioning
confidence: 91%