2011
DOI: 10.1371/journal.pone.0028210
|View full text |Cite
|
Sign up to set email alerts
|

The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures

Abstract: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. In this study we compare feature selection methods on public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretabili… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
233
1
4

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 329 publications
(244 citation statements)
references
References 33 publications
6
233
1
4
Order By: Relevance
“…In stability selection, a threshold value is used to determine the value of each feature. For each list, whether the examined feature exceeds the threshold value or not is determined and based on this judgement, a position is assigned to the feature [48]. The exponential weighting method extends the stability selection by assigning points based on e − r/s , where r denotes the rank of feature and s denotes the threshold value.…”
Section: Methodsmentioning
confidence: 99%
“…In stability selection, a threshold value is used to determine the value of each feature. For each list, whether the examined feature exceeds the threshold value or not is determined and based on this judgement, a position is assigned to the feature [48]. The exponential weighting method extends the stability selection by assigning points based on e − r/s , where r denotes the rank of feature and s denotes the threshold value.…”
Section: Methodsmentioning
confidence: 99%
“…Novakovic et al compared 6 feature selection techniques on 2 data sets and they used classification accuracy as a criterion [21]. Haury et al compared 8 feature selection techniques on 4 data sets [22]. Silva et al compared 4 existing feature selection techniques (information gain, gain ratio, chi square, correlation) on 1 data set from the domain of agriculture [23].…”
Section: Feature Selectionmentioning
confidence: 99%
“…They compared the stability of three wrapper approaches. Haury et al [4] evaluated a number of feature ranking methods and one wrapper-based subset evaluation technique and considered stability in terms of how many features are in common between two subsets generated from independent subsamples of the original data. Dunne et al [5] considered wrappers using a 3-nearest neighbor learner and three choices of search technique, evaluating stability by resampling the original dataset and finding the Hamming distance between the various feature subset masks.…”
Section: Related Workmentioning
confidence: 99%
“…Another work, Haury et al [4], considers the role of overlap when considering the stability of gene subsets. In addition to other analysis of their datasets, the researchers consider the fraction of instances in common when comparing feature lists generated from subsamples of the original data which either have 80% or 0% overlap.…”
Section: Related Workmentioning
confidence: 99%