Cervical cancer is one of the leading causes of premature mortality among women worldwide and more than 85% of these deaths are in developing countries. There are several risk factors associated with cervical cancer. In this paper, we developed a predictive model for predicting the outcome of patients with cervical cancer, given risk patterns from individual medical records and preliminary screening. This work presents a decision tree (DT) classification algorithm to analyze the risk factors of cervical cancer. Recursive feature elimination (RFE) and least absolute shrinkage and selection operator (LASSO) feature selection techniques were fully explored to determine the most important attributes for cervical cancer prediction. The dataset employed here contains missing values and is highly imbalanced. Therefore, a combination of under and oversampling techniques called SMOTETomek was employed. A comparative analysis of the proposed model has been performed to show the effectiveness of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, and specificity. The DT with the selected features from RFE and SMOTETomek has better results with an accuracy of 98.72% and sensitivity of 100%. DT classifier is shown to have better performance in handling classification problems when the features are reduced, and the problem of high class imbalance is addressed.
Breast cancer is a prevalent disease that affects mostly women, and early diagnosis will expedite the treatment of this ailment. Recently, machine learning (ML) techniques have been employed in biomedical and informatics to help fight breast cancer. Extracting information from data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and feature extraction techniques has significantly changed the whole process of a breast cancer diagnosis. This research work proposed a machine learning model for the classification of breast cancer. To achieve this, a support vector machine (SVM) was employed for the classification, and linear discriminant analysis (LDA) was employed for feature extraction. We measured our model’s feature extraction performance in principal component analysis (PCA) and random forest for classification. A comparative analysis of the proposed model was performed to show the effectiveness of the feature extraction, and we computed missing values based on the classifier’s accuracy, precision, and recall. The original Wisconsin Breast Cancer dataset (WBCD) and Wisconsin Prognostic Breast Cancer dataset (WPBC) were used. We evaluated performance in two phases: In phase 1, rows containing missing values were computed using the mean, and in phase 2, rows containing missing values were computed using the median. LDA–SVM when median was used to compute missing values has better results, with accuracy of 99.2%, recall of 98.0% and precision of 98.0% on the WBCD dataset and an accuracy of 79.5%, recall of 76.0% and precision of 59.0% on the WPBC dataset. The SVM classifier had a better performance in handling classification problems when LDA was applied and the median was used as a method for computing missing values.
With the advent of new technologies in the medical field, huge amounts of cancerous data have been collected and are readily accessible to the medical research community. Over the years, researchers have employed advanced data mining and machine learning techniques to develop better models that can analyze datasets to extract the conceived patterns, ideas, and hidden knowledge. The mined information can be used as a support in decision making for diagnostic processes. These techniques, while being able to predict future outcomes of certain diseases effectively, can discover and identify patterns and relationships between them from complex datasets. In this research, a predictive model for predicting the outcome of patients’ cervical cancer results has been developed, given risk patterns from individual medical records and preliminary screening tests. This work presents a Decision tree (DT) classification algorithm and shows the advantage of feature selection approaches in the prediction of cervical cancer using recursive feature elimination technique for dimensionality reduction for improving the accuracy, sensitivity, and specificity of the model. The dataset employed here suffers from missing values and is highly imbalanced. Therefore, a combination of under and oversampling techniques called SMOTETomek was employed. A comparative analysis of the proposed model has been performed to show the effectiveness of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, and specificity. The DT with the selected features and SMOTETomek has better results with an accuracy of 98%, sensitivity of 100%, and specificity of 97%. Decision Tree classifier is shown to have excellent performance in handling classification assignment when the features are reduced, and the problem of imbalance class is addressed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.