2020
DOI: 10.3390/genes11070717
|View full text |Cite
|
Sign up to set email alerts
|

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Abstract: The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the li… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(7 citation statements)
references
References 76 publications
0
7
0
Order By: Relevance
“…Implementation was performed in MATLAB R 2015. The datasetsused are Lung Cancer [14], Prostrate Tumor from repository and SRBCT [14]. The details of the dataset are listed in table 5.1.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Implementation was performed in MATLAB R 2015. The datasetsused are Lung Cancer [14], Prostrate Tumor from repository and SRBCT [14]. The details of the dataset are listed in table 5.1.…”
Section: Resultsmentioning
confidence: 99%
“…F-measure could be an appropriate measure to evaluate the efficiency of proposed method. Proposed PRBMF-iBAT is compared with Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA) [14 ]for Support Vector Machine. Fivefold cross validation is performed to compare the results.…”
Section: Resultsmentioning
confidence: 99%
“…Identifying the most important features was based on the two most used feature selection filter methods in ML: (1) feature importance and (2) correlation-based feature selection. We used filter methods of feature selection because it is independent of the potential models [ 11 ]. Feature importance is a univariate filter that compares each feature’s correlation with the outcome separately and removes features with zero importance according to a gradient boosting machine (GBM) learning model.…”
Section: Methodsmentioning
confidence: 99%
“…Due to inconsistent presentations of training intensities from various training methods such as "6 repetition maximum (RM)" in strength work, "85% of 1 RM" in weightlifting, or "bodyweight" in plyometrics, the "intensity" was discarded, but instead, the input of multiple training methods was allowed such that lower limb strength training represents training intensity with the use of at least 80% 1 RM in no less than two weeks of training. Since the type of sports background of subjects were diverse in the selected studies, they were summarized as "vertical based sports", "horizontal based sports", and "other sports" based on the characterized nature of sports movements to avoid an imbalanced dataset or cardinality issues [44]. Furthermore, training programs of intervention studies varied training volumes in different phases or periods.…”
Section: Identification Of Predictorsmentioning
confidence: 99%