Many women around the world die due to breast cancer. If breast cancer is treated in the early phase, mortality rates may significantly be reduced. Quite a number of approaches have been proposed to help in the early detection of breast cancer. A novel hybrid feature selection model is suggested in this study. This novel hybrid model aims to build an efficient feature selection method and successfully classify breast lesions. A combination of relief and binary Harris hawk optimization (BHHO) hybrid model is used for feature selection. Then, k-nearest neighbor (k-NN), support vector machine (SVM), logistic regression (LR) and naive Bayes (NB) methods are preferred for the classification task. The suggested hybrid model is tested by three different breast cancer datasets which are Wisconsin diagnostic breast cancer dataset (WDBC), Wisconsin breast cancer dataset (WBCD) and mammographic breast cancer dataset (MBCD). According to the experimental results, the relief and BHHO hybrid model improves the performance of all classification algorithms in all three datasets. For WDBC, relief-BHO-SVM model shows the highest classification rates with an of accuracy of 98.77%, precision of 97.17%, recall of 99.52%, F1-score of 98.33%, specificity of 99.72% and balanced accuracy of 99.62%. For WBCD, relief-BHO-SVM model achieves of accuracy of 99.28%, precision of 98.76%, recall of 99.17%, F1-score of 98.96%, specificity of 99.56% and balanced accuracy of 99.36%. Relief-BHO-SVM model performs the best with an accuracy of 97.44%, precision of 97.41%, recall of 98.26%, F1-score of 97.84%, specificity of 97.47% and balanced accuracy of 97.86% for MBCD. Furthermore, the relief-BHO-SVM model has achieved better results than other known approaches. Compared with recent studies on breast cancer classification, the suggested hybrid method has achieved quite good results.
Breast cancer seriously affects many women. If breast cancer is detected at an early stage, it may be cured. This paper proposes a novel classification model based improved machine learning algorithms for diagnosis of breast cancer at its initial stage. It has been used by combining feature selection and Bayesian optimization approaches to build improved machine learning models. Support Vector Machine, K-Nearest Neighbor, Naive Bayes, Ensemble Learning and Decision Tree approaches were used as machine learning algorithms. All experiments were tested on two different datasets, which are Wisconsin Breast Cancer Dataset (WBCD) and Mammographic Breast Cancer Dataset (MBCD). Experiments were implemented to obtain the best classification process. Relief, Least Absolute Shrinkage and Selection Operator (LASSO) and Sequential Forward Selection were used to determine the most relevant features, respectively. The machine learning models were optimized with the help of Bayesian optimization approach to obtain optimal hyperparameter values. Experimental results showed the unified feature selection-hyperparameter optimization method improved the classification performance in all machine learning algorithms. Among the various experiments, LASSO-BO-SVM showed the highest accuracy, precision, recall and F1-score for two datasets (97.95%, 98.28%, 98.28%, 98.28% for MBCD and 98.95%, 97.17%, 100%, 98.56% for MBCD), yielding outperforming results compared to recent studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.