2004
DOI: 10.1093/bioinformatics/bth267
|View full text |Cite
|
Sign up to set email alerts
|

A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Abstract: This paper studies the problem of building multiclass classifiers for tissue classification based on gene expression. The recent development of microarray technologies has enabled biologists to quantify gene expression of tens of thousands of genes in a single experiment. Biologists have begun collecting gene expression for a large number of samples. One of the urgent issues in the use of microarray data is to develop methods for characterizing samples based on their gene expression. The most basic step in the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

9
302
2
6

Year Published

2005
2005
2017
2017

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 581 publications
(319 citation statements)
references
References 32 publications
9
302
2
6
Order By: Relevance
“…For instance, in the Leukemia set results, at a confidence level of t = 200, four variables with four arcs correctly predict 70% of the samples; in the CRC domain, at a level of t = 310, the estimation of the accuracy with only four variables and three arcs achieves a mean value of 96%. This fact corroborates other studies regarding gene expression classification based on a reduced number of genes [4,12,48]. The cardinality of the highest configured arc is included.…”
Section: Clasi/ication Accuracysupporting
confidence: 90%
“…For instance, in the Leukemia set results, at a confidence level of t = 200, four variables with four arcs correctly predict 70% of the samples; in the CRC domain, at a level of t = 310, the estimation of the accuracy with only four variables and three arcs achieves a mean value of 96%. This fact corroborates other studies regarding gene expression classification based on a reduced number of genes [4,12,48]. The cardinality of the highest configured arc is included.…”
Section: Clasi/ication Accuracysupporting
confidence: 90%
“…Initialize the mixing coefficient, α m , for each component, m, in the grid to 1/M ; Set the mean and the variance of the shared distribution, q(·|λ), as the mean and covariance of the training set; repeat Compute R, U and V using (6), (7) and (8) respectively, using current parameters, Θ; (9); end Obtain the center, µ m , of each component, m, of the mixture in the data space, using (11); Reestimate the width of the diagonal Gaussians, σ d , using (12), for all the features; Reestimate the mean and the variance of the shared distribution using (13) and (14) respectively; Reestimate the feature weight, ρ d , using (15), for all the features; until convergence; end The parameters are estimated using a variant of the EM algorithm as follows.…”
Section: Gtm With Feature Saliency (Gtm-fs)mentioning
confidence: 99%
“…Filter methods score the merits of variables using intrinsic data properties such as information, distance, dependency and consistency, and then select a subset of variables as a preprocessing step independently of the choice of learning machine (Dhillon, et al, 2003;Torkkola, 2003;Li, et al, 2004;Yang and Pedersen 1997;Bolon-Canedo et al, 2012;Forman, 2004;You and Li, 2011;Rajapakse and Mundra, 2013). Filter methods usually are fast, but because they do not consider variable subsets' effects on the learning process, they can select a redundant one.…”
Section: Introductionmentioning
confidence: 99%