2011
DOI: 10.1093/bib/bbq073
|View full text |Cite
|
Sign up to set email alerts
|

An empirical assessment of validation practices for molecular classifiers

Abstract: Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
55
0
6

Year Published

2011
2011
2018
2018

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 86 publications
(63 citation statements)
references
References 83 publications
2
55
0
6
Order By: Relevance
“…However, in that study, class discovery was repeated in the replication sample as opposed to the approach used in this analysis where all key model parameters were learned in the training data and directly transferred to the replication set. The latter approach provides a more stringent assessment of generalizability [4,16]. …”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, in that study, class discovery was repeated in the replication sample as opposed to the approach used in this analysis where all key model parameters were learned in the training data and directly transferred to the replication set. The latter approach provides a more stringent assessment of generalizability [4,16]. …”
Section: Discussionmentioning
confidence: 99%
“…Signatures for several diseases are in clinical use, but many gene expression signatures are poorly reproducible and suffer from sampling dependent instability [1,2]. While inappropriate analysis methods account for some of the poor reproducibility of published signatures [3,4], another potential cause is the presence of unrecognized biologic variability, such as occult molecular disease subtypes. Interestingly, one perspective on gene expression data is that it contains too much information, i.e.…”
Section: Introductionmentioning
confidence: 99%
“…Other studies identifying and analyzing gene signatures have shown impressive results; however, these studies often lacked rigorous data analysis and validation (26). Conversely, the gene signature, based on data from previous gene expression profiling studies, was independently selected and validated in separate settings.…”
Section: Discussionmentioning
confidence: 99%
“…This procedure is repeated many times until each sample has been in the test set exactly once. The accuracy of the model on the test samples gives an estimate of the predictive power and the robustness of the model to perturbations of the data (Castaldi et al, 2011; Ioannidis and Khoury, 2011). …”
Section: Metabolomics Data Analysismentioning
confidence: 99%