A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Li, Tao; Zhang, Chengliang; Ogihara, Mitsunori

doi:10.1093/bioinformatics/bth267

Cited by 581 publications

(319 citation statements)

References 32 publications

Supporting

Mentioning

302

Contrasting

Unclassified

Order By: Relevance

“…For instance, in the Leukemia set results, at a confidence level of t = 200, four variables with four arcs correctly predict 70% of the samples; in the CRC domain, at a level of t = 310, the estimation of the accuracy with only four variables and three arcs achieves a mean value of 96%. This fact corroborates other studies regarding gene expression classification based on a reduced number of genes [4,12,48]. The cardinality of the highest configured arc is included.…”

Section: Clasi/ication Accuracysupporting

confidence: 90%

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Armañanzas

Inza

Larrañaga

2008

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels Keywords:an d on the phenotype information of the supervised variable. Feature selection and bootBayesian network classifiers strap resampling add reliability and robustness to the overall process removing the false Robust arc identification positive findings. The consensus among all the induced models produces a hierarchy of Gene interactions dependences and, thus, of variables. Biologists can define the depth level of the model hierar-DNA microarrays chy so the set of interactions and genes involved can vary from a sparse to a dense set. ExperKnowledge discovery imental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studies.

show abstract

Section: Clasi/ication Accuracysupporting

confidence: 90%

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Armañanzas

Inza

Larrañaga

2008

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

show abstract

“…Initialize the mixing coefficient, α m , for each component, m, in the grid to 1/M ; Set the mean and the variance of the shared distribution, q(·|λ), as the mean and covariance of the training set; repeat Compute R, U and V using (6), (7) and (8) respectively, using current parameters, Θ; (9); end Obtain the center, µ m , of each component, m, of the mixture in the data space, using (11); Reestimate the width of the diagonal Gaussians, σ d , using (12), for all the features; Reestimate the mean and the variance of the shared distribution using (13) and (14) respectively; Reestimate the feature weight, ρ d , using (15), for all the features; until convergence; end The parameters are estimated using a variant of the EM algorithm as follows.…”

Section: Gtm With Feature Saliency (Gtm-fs)mentioning

confidence: 99%

Data Visualization with Simultaneous Feature Selection

Maniyar

Nabney

2006

2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology

View full text Add to dashboard Cite

Abstract-Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of deciding feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user assess the significance of each feature. We compare the quality of the projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections.

show abstract

“…Filter methods score the merits of variables using intrinsic data properties such as information, distance, dependency and consistency, and then select a subset of variables as a preprocessing step independently of the choice of learning machine (Dhillon, et al, 2003;Torkkola, 2003;Li, et al, 2004;Yang and Pedersen 1997;Bolon-Canedo et al, 2012;Forman, 2004;You and Li, 2011;Rajapakse and Mundra, 2013). Filter methods usually are fast, but because they do not consider variable subsets' effects on the learning process, they can select a redundant one.…”

Section: Introductionmentioning

confidence: 99%

Variable selection methods for multi-class classification using signomial function

Hwang

Lee

Park

2017

Journal of the Operational Research Society

View full text Add to dashboard Cite

We develop several variable selection methods using signomial function to select relevant variables for multiclass classification by taking all classes into consideration. We introduce a ' 1 -norm regularization function to measure the number of selected variables and two adaptive parameters to apply different importance weights for different variables according to their relative importance. The proposed methods select variables suitable for predicting the output and automatically determine the number of variables to be selected. Then, with the selected variables, they naturally obtain the resulting classifiers without an additional classification process. The classifiers obtained by the proposed methods yield competitive or better classification accuracy levels than those by the existing methods.

show abstract

A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Cited by 581 publications

References 32 publications

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Data Visualization with Simultaneous Feature Selection

Variable selection methods for multi-class classification using signomial function

Contact Info

Product

Resources

About