2015
DOI: 10.1371/journal.pone.0119301
|View full text |Cite
|
Sign up to set email alerts
|

Systematic Artifacts in Support Vector Regression-Based Compound Potency Prediction Revealed by Statistical and Activity Landscape Analysis

Abstract: Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 28 publications
(22 citation statements)
references
References 27 publications
0
22
0
Order By: Relevance
“…With respect to studies that reported systematic underprediction affecting peak values (e.g., Balfer and Bajorath 2015;Garsole and Rajurkar 2015;Granata et al 2016), this study was unable to access their data sets to determine their data distributions and verify the causes of underprediction. Nevertheless, the problem of underprediction affecting extreme-high dependent values is present in the modeling results of these studies, which corroborate the observation here that only underprediction is present in the prediction of extreme-high values.…”
Section: Analysis Of Systematic Underpredictionmentioning
confidence: 96%
See 2 more Smart Citations
“…With respect to studies that reported systematic underprediction affecting peak values (e.g., Balfer and Bajorath 2015;Garsole and Rajurkar 2015;Granata et al 2016), this study was unable to access their data sets to determine their data distributions and verify the causes of underprediction. Nevertheless, the problem of underprediction affecting extreme-high dependent values is present in the modeling results of these studies, which corroborate the observation here that only underprediction is present in the prediction of extreme-high values.…”
Section: Analysis Of Systematic Underpredictionmentioning
confidence: 96%
“…In the case of the 2000-2014 data set for Yuen Long Creek, extreme-high BODs are subject to underprediction. Inherently, the systematic underprediction results from an SVR attempting to minimize both prediction errors and model complexity (Balfer and Bajorath 2015). As extreme-high values often represent only a small proportion of the data set, the SVR will algorithmically tolerate prediction errors in an effort to derive a sufficiently complex model that provides accurate predictions for the majority of the data-that is, the nonextreme values.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…It is important to recognize that extrapolation is considered a challenging problem for ML methods, especially when these methods are used without an adaptive learning scheme or iterative feedback loop [43]. Balfer and Bajorath [44] showed that when SVR methods are used for building quantitative structure-activity relationships, these methods systematically underestimate the true value of high-potency compounds due to the regularization term in the loss function that balances the trade-off between model complexity and its ability to generalize to an unseen data point. Therefore, our ML prediction for Snsubstituted FeGe needs further attention and cannot be discarded solely on the basis of ML predictions.…”
Section: Invited Feature Papermentioning
confidence: 99%
“…The earliest work with QSPR was the prediction of efficacy relationship models . Some recent work and representative work with QSPR are prediction of activity cliffs, ligand efficiencies, compound potency, the regulative role of atomic autocorrelated electronegativities and polarizabilities in beta 2 potency, lipophilicity, and detonation velocity . Balabin et al compared SVR with artificial neural network regression using NIR data obtained from 14 sets of petroleum fuels and products .…”
Section: Introductionmentioning
confidence: 99%