2015
DOI: 10.1080/1062936x.2015.1084647
|View full text |Cite
|
Sign up to set email alerts
|

Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters

Abstract: Recent implementations of QSAR modeling software provide the user with numerous models and a wealth of information. In this work, we provide some guidance on how one should interpret the results of QSAR modeling, compare and assess the resulting models and select the best and most consistent ones. Two QSAR datasets are applied as case studies for the comparison of model performance parameters and model selection methods. We demonstrate the capabilities of sum of ranking differences (SRD) in model selection and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
57
0
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 99 publications
(58 citation statements)
references
References 41 publications
0
57
0
1
Order By: Relevance
“…ref. [62]. It is clear that in this case in silico methods are close to the recommended logK OW (exp) values, while chromatographic estimations might seem to perform worse.…”
Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning
confidence: 65%
See 1 more Smart Citation
“…ref. [62]. It is clear that in this case in silico methods are close to the recommended logK OW (exp) values, while chromatographic estimations might seem to perform worse.…”
Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning
confidence: 65%
“…Independently from this, our recent paper clearly shows that the ordering of merits for external validation is indistinguishable from random ranking. [62]. Nevertheless we have carried out the SRD and the GPCM ranking of lipophilicity measures on a subset of compounds with logK OW values that are likely to be correctly measured with the shake-flask method (logK OW < 3 and determined with the shake-flask procedure which was verified through a meticulous tracing of the original articles, Table S1, Supplementary material).…”
Section: Comparison Of Lipophilicity Measures By Means Of Srd and Gpcmmentioning
confidence: 99%
“…The values are presented together with computationally estimated logK OC -s in the In order to identify the best and the worst logK OC determination method non-parametric comparison by the SRD was applied on the entire set of logK OC values. The SRD method has been already successfully employed to rank and group variables, finding statistically significant differences even if the variables are highly correlated [40][41][42][43][44][45], which is the case with the present set of logK OC values.…”
Section: Determination Of Logk Oc Values and Comparison Of Chromatogrmentioning
confidence: 90%
“…Here, N is the number of compounds of the training set, R 2 is the coefficient of determination, R 2 adj is adjusted R 2 , s is standard error of estimate, F is variance ratio, LOF is Friedman lack of fit 41,42 , Kxx is the correlation among descriptors 38 , Delta K is the difference of the correlation between the descriptors (Kx) and the descriptors plus the responses (Kxy), RMSE tr is Root Mean Square Error in fitting (for training set), MAE tr is Mean Absolute Error in fitting (calculated on training set), RSS tr is Residual Sum of Squares in fitting (also for training set) and CCC tr is the concordance correlation coefficient calculated over the training set 43,44,45 . The model projects an R 2 value is of 0.8737, which means a proper fitness for modelling Syk protein inhibition.…”
Section: Qsar Model Construction and Validationmentioning
confidence: 99%