2011
DOI: 10.1186/1471-2105-12-153
|View full text |Cite
|
Sign up to set email alerts
|

To aggregate or not to aggregate high-dimensional classifiers

Abstract: BackgroundHigh-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data.ResultsPrincipal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a bas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…Our finding of voting superiority agrees with four other studies: one using multiple different feature extraction methods in combination with SVM for gene microarray classification showed that using a voting-based method across all the examined combinations achieved a better 10-fold cross-validated classification performance compared to any single combination [33]; a second study showed that an SVM-based ensemble outperformed single SVM for microarray data classification [42]; a third study comparing the performance of principal component discriminant classifiers either with or without voting using cross-validation applied on a simulated dataset a leukemia microarray gene expression dataset, a Gaucher serum proteomics dataset and a grape extract metabolomics dataset, also showed that voting had a better performance than the nonvoting method [43]; a fourth study comparing the performance of single models to combined models in thirteen diverse microarray datasets, which included predictions of estrogen receptor positivity and complete pathological response to chemotherapy in breast cancer, found that the majority of combined multiple models had a better classification performance than the single models [21]. Furthermore, our findings also agree with a study by Taylor and Kim who, by splitting their original datasets into training and test parts, showed that voting based on nearest mean voters was a top performing method with respect to classification on lung or prostate cancer data.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…Our finding of voting superiority agrees with four other studies: one using multiple different feature extraction methods in combination with SVM for gene microarray classification showed that using a voting-based method across all the examined combinations achieved a better 10-fold cross-validated classification performance compared to any single combination [33]; a second study showed that an SVM-based ensemble outperformed single SVM for microarray data classification [42]; a third study comparing the performance of principal component discriminant classifiers either with or without voting using cross-validation applied on a simulated dataset a leukemia microarray gene expression dataset, a Gaucher serum proteomics dataset and a grape extract metabolomics dataset, also showed that voting had a better performance than the nonvoting method [43]; a fourth study comparing the performance of single models to combined models in thirteen diverse microarray datasets, which included predictions of estrogen receptor positivity and complete pathological response to chemotherapy in breast cancer, found that the majority of combined multiple models had a better classification performance than the single models [21]. Furthermore, our findings also agree with a study by Taylor and Kim who, by splitting their original datasets into training and test parts, showed that voting based on nearest mean voters was a top performing method with respect to classification on lung or prostate cancer data.…”
Section: Conclusion and Discussionmentioning
confidence: 99%