2019
DOI: 10.1515/cmb-2019-0008
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing high dimensional correlated data using feature ranking and classifiers

Abstract: The Illumina Infinium HumanMethylation27 (Illumina 27K) BeadChip assay is a relatively recent high-throughput technology that allows over 27,000 CpGs to be assayed. The Illumina 27K methylation data is less commonly used in comparison to gene expression in bioinformatics. It provides a critical need to find the optimal feature ranking (FR) method for handling the high dimensional data. The optimal FR method on the classifier is not well known, and choosing the best performing FR method becomes more challenging… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 42 publications
0
5
0
Order By: Relevance
“…(iii) Due to partial probe missing in the TCGA data, we did not take into account the local methylation correlations (i.e., significant correlations in methylation levels in neighboring regions) in this study, as it could lead to the loss of information related to methylation cascade regions. We plan to address this issue using more sophisticated methods in our future research [34][35][36]. (iv) Moreover, according to UMAP cluster results, it can be determined that there are relatively obvious differences between samples from different databases, which can be attributed to the different methods used to correct and normalize methylation data [37].…”
Section: Discussionmentioning
confidence: 99%
“…(iii) Due to partial probe missing in the TCGA data, we did not take into account the local methylation correlations (i.e., significant correlations in methylation levels in neighboring regions) in this study, as it could lead to the loss of information related to methylation cascade regions. We plan to address this issue using more sophisticated methods in our future research [34][35][36]. (iv) Moreover, according to UMAP cluster results, it can be determined that there are relatively obvious differences between samples from different databases, which can be attributed to the different methods used to correct and normalize methylation data [37].…”
Section: Discussionmentioning
confidence: 99%
“…In building a classification model, it is a very important process to select features that affect the classification result [16]. In this study, popular feature selection methods were used, such as F-score, chi-square, and mutual information, which are univariate feature selection methods [16][17][18]. In addition, a feature selection technique using Gini importance was also used [19].…”
Section: Feature Selection Methodsmentioning
confidence: 99%
“…The Fisher score (F-score) is one of the most popular feature selection methods [21]. F-score is a univariate selection method and selects the optimal features based on a statistical model [16]. It can be used mainly in linear models.…”
Section: Fisher Scorementioning
confidence: 99%
See 1 more Smart Citation
“…Accuracy is a commonly used criterion for evaluating the efficacy of classification models when applied to balanced datasets. The computation entails the determination of the proportion of correctly classified samples relative to the overall number of samples used in the classification model [31].…”
Section: Performance Metricsmentioning
confidence: 99%