2013
DOI: 10.1016/j.aca.2012.11.007
|View full text |Cite
|
Sign up to set email alerts
|

Sample size planning for classification models

Abstract: In biospectroscopy, suitably annotated and statistically independent samples (e.g. patients, batches, etc.) for classifier training and testing are scarce and costly. Learning curves show the model performance as function of the training sample size and can help to determine the sample size needed to train good classifiers. However, building a good model is actually not enough: the performance must also be proven. We discuss learning curves for typical small sample size situations with 5-25 independent samples… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
253
0
3

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 388 publications
(263 citation statements)
references
References 16 publications
7
253
0
3
Order By: Relevance
“…Statistical significance is a critical consideration, and when patient diagnosis is the outcome, misclassification has serious consequences. Beleites et al, have carried out a study examining the effects of sample size on multivariate classifier models for clinical biospectroscopy [55]. It is demonstrated that, while learning curves for dataset sizes common to small scale academic studies can indicate acceptable performance, the model testing is itself limited by the dataset size and that datasets of 75-100 samples are required to produce "a good but not perfect classifier".…”
Section: Discussionmentioning
confidence: 99%
“…Statistical significance is a critical consideration, and when patient diagnosis is the outcome, misclassification has serious consequences. Beleites et al, have carried out a study examining the effects of sample size on multivariate classifier models for clinical biospectroscopy [55]. It is demonstrated that, while learning curves for dataset sizes common to small scale academic studies can indicate acceptable performance, the model testing is itself limited by the dataset size and that datasets of 75-100 samples are required to produce "a good but not perfect classifier".…”
Section: Discussionmentioning
confidence: 99%
“…The concentration of proteins differs as well. Obtained data can be used for purposes of cell identification [7,38] or planning of further biospectroscopic experiments [5].…”
Section: Application To the Study Of Cancer Cellsmentioning
confidence: 99%
“…The tube lens L 4 focuses the parallel beam of Raman signal. Depending on the mode of operation, the Raman signal is delivered to the SPG either via MMF (confocal Raman) or free-space via mirrors M 3,4 and lenses L 5,6 .…”
Section: Detection Of the Stokes Signalmentioning
confidence: 99%
“…It is a general phrase in literature that more samples in training gives more successful classification result [3]. But studies also focus on cost and time effect of classification process [4], [5].…”
Section: Related Workmentioning
confidence: 99%
“…Not only the quality, but also the sample size selected in training step plays important role on the results. More selected samples in training generate more accurate classification models hence, logically; the result success should be increased as well [3]. This theory can't be generalized over all datasets but mostly data set classification models give more successful separation with more training samples.…”
Section: Introductionmentioning
confidence: 99%