2021
DOI: 10.3390/app11146574
|View full text |Cite
|
Sign up to set email alerts
|

On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction

Abstract: Breast cancer prediction datasets are usually class imbalanced, where the number of data samples in the malignant and benign patient classes are significantly different. Over-sampling techniques can be used to re-balance the datasets to construct more effective prediction models. Moreover, some related studies have considered feature selection to remove irrelevant features from the datasets for further performance improvement. However, since the order of combining feature selection and over-sampling can result… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…These results agree with other authors who studied the combined feature selection and resampling method for imbalance data learning and found that in 79% of the study cases, balancing before feature selection improves the results [ 59 ]. We also showed the feasibility of combining both the feature selection and resampling methods in the same iterative process to deal with imbalanced data, in contrast to resampling before or after feature selection [ 59 , 60 ]. Balancing validation data to deal with the imbalance data problem was similar to the strategy proposed by Jain et al, who used a weighted sum of recall and specificity as the fitness function [ 61 ].…”
Section: Discussionmentioning
confidence: 99%
“…These results agree with other authors who studied the combined feature selection and resampling method for imbalance data learning and found that in 79% of the study cases, balancing before feature selection improves the results [ 59 ]. We also showed the feasibility of combining both the feature selection and resampling methods in the same iterative process to deal with imbalanced data, in contrast to resampling before or after feature selection [ 59 , 60 ]. Balancing validation data to deal with the imbalance data problem was similar to the strategy proposed by Jain et al, who used a weighted sum of recall and specificity as the fitness function [ 61 ].…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the paper does not extensively compare the proposed framework with more recent and state-of-the-art clustering methods. Huang et al [9], proposed a methodology to improve the accuracy of breast cancer prediction models. The paper highlights the challenges posed by imbalanced datasets in breast cancer prediction, where the minority class (cancer cases) has significantly fewer samples than the majority class (non-cancer cases).…”
Section: Figure 1 Progressive Learning Curvementioning
confidence: 99%
“…Dengan melakukan seleksi fitur terlebih dahulu, jumlah fitur akan dikurangi sehingga SMOTE dapat berfungsi lebih efektif dalam menghasilkan sampel sintetis yang merepresentasikan kelas minoritas secara akurat. Hal tersebut juga disarankan oleh peneliti terdahulu (Huang et al, 2021).…”
Section: Smoteunclassified