2017
DOI: 10.1007/s11696-017-0215-7
|View full text |Cite
|
Sign up to set email alerts
|

SCRAMBLE’N’GAMBLE: a tool for fast and facile generation of random data for statistical evaluation of QSAR models

Abstract: A common practice in modern QSAR modelling is to derive models by variable selection methods working on large descriptor pools. As pointed out previously, this is intrinsically burdened with the risk of finding random correlations. Therefore it is desirable to perform tests showing the performance of models built on random data. In this contribution, we introduce a simple and freely available software tool SCRAMBLE’N’GAMBLE that is aimed at facilitating data preparation for y-randomization and pseudo-descripto… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(21 citation statements)
references
References 58 publications
0
21
0
Order By: Relevance
“…Secondly, all models achieved higher prediction accuracy than y-scrambling models (Fig. 1 ), demonstrating they all had a predictive power exceeding that of pure chance [ 30 ]. Thirdly, for LOCO-CV and external test set a slightly better predictive performance was found using the highest confidence dataset in comparison to the lower confidence datasets, although it should be noted that these models are not directly comparable given the varying dataset sizes (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Secondly, all models achieved higher prediction accuracy than y-scrambling models (Fig. 1 ), demonstrating they all had a predictive power exceeding that of pure chance [ 30 ]. Thirdly, for LOCO-CV and external test set a slightly better predictive performance was found using the highest confidence dataset in comparison to the lower confidence datasets, although it should be noted that these models are not directly comparable given the varying dataset sizes (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…demonstrating they all had a predictive power exceeding that of pure chance (28). Thirdly, for LOCO-CV and external test set a slightly better predictive performance was found using the highest confidence dataset in comparison to the lower confidence datasets (Fig.…”
Section: Predictive Modelingmentioning
confidence: 79%
“…This is a form of arrangement test, where the values of the dependent variable ( y ) are randomly assigned to different compounds, whereas the descriptor values ( x ’s) are left unchanged. 29 The rearranged data are then used for training QSAR models. As shown in Table 2 , the value of λ (Wilks) increases significantly in all the validation models.…”
Section: Resultsmentioning
confidence: 99%