2016
DOI: 10.1186/s12859-016-0990-0
|View full text |Cite
|
Sign up to set email alerts
|

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Abstract: BackgroundHigh-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This “large p, small n” paradigm in the area of biomedical “big data” may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
66
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 102 publications
(68 citation statements)
references
References 44 publications
2
66
0
Order By: Relevance
“…The MIC with a range ½0; 1 is normalised and symmetric. Moreover, a high value of MIC indicates a high correlation between the corresponding variables, whereas MIC ¼ 0 indicates that two corresponding variables are independent (Ge et al, 2016). The MIC of two variables x i and x j is defined as follows (Zhang, Jia, Huang, Qiu, & Zhou, 2014):…”
Section: Correlated and Weakly Correlated Variable Divisionmentioning
confidence: 99%
“…The MIC with a range ½0; 1 is normalised and symmetric. Moreover, a high value of MIC indicates a high correlation between the corresponding variables, whereas MIC ¼ 0 indicates that two corresponding variables are independent (Ge et al, 2016). The MIC of two variables x i and x j is defined as follows (Zhang, Jia, Huang, Qiu, & Zhou, 2014):…”
Section: Correlated and Weakly Correlated Variable Divisionmentioning
confidence: 99%
“…Wrappers screen for a subset of features with an optimal performance measurement, which is usually the classification accuracy or error rate. Wrappers are usually slower but more accurate than filters [32]. So this study chose two wrapper feature selection algorithms CFS (Correlation-based Feature Subset) and CSE (Classifier Subset Evaluator) to find the feature subset significantly associated with the phenotypes [33,34].…”
Section: Feature Optimization and Classificationmentioning
confidence: 99%
“…CSE reduced the numbers of features from 493 to only 23 features. Previous comparative studies demonstrated that the above feature selection algorithms usually performed best on selection a feature subset with good classification performances, while feature selection algorithms based on the method of filter like t-test optimize the phenotype association significances [32].…”
Section: Feature Optimization and Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…The method and software tool got good performance on several bioinformatics problems [1416]. Zhou’s lab (Health Informatics Laboratory) described a feature selection algorithm, McTwo, to select features associated with phenotypes, independently of each other, and achieving high classification performance [17]. While, unsupervised methods select features when the document class labels are absente [18–20].…”
Section: Introductionmentioning
confidence: 99%