Cervical cancer is a common cancer that affects women all over the world. This is the fourth leading cause of death among women and has no symptoms in its early stages. At the cervix, cervical cancer cells develop slowly. If it can be detected early, this cancer can be successfully treated. Health professionals are now facing a major challenge in detecting such cancer until it spreads rapidly. This study applied various machine learning classification methods to predict cervical cancer using risk factors. The main aim of this research work is to be described of the performance variation of eight most classifications algorithm to detect cervical cancer disease based on the selection of various top features sets from the dataset. Multilayer Perceptron (MLP), Random Forest and k-Nearest Neighbor, Decision Tree, Logistic Regression, SVC, Gradient Boosting, AdaBoost are examples of machine learning classification algorithms that have been used to predict cervical cancer and help in early diagnosis. A variety of approaches are used to avoid missing values in the dataset. To choose the various best features, a combination of feature selection techniques such as Chi-square, SelectBest and Random Forest was used. The performance of those classifications is evaluated using the accuracy, recall, precision and f1-score parameters. On a variety of top feature sets, MLP outperformed other classification models. The majority of classification models, on the other hand, claim to have the highest accuracy on the top 25 features in dataset splitting ratio (70:30). For each model, the percentage of correctly classified instances has been presented and all of the results are then discussed. Medical professionals will be able to use the suggested approach to perform research on cervical cancer.
Background: Protein-Protein Interaction (PPI) has emerged as a key role in the control of many biological processes including protein function, disease incidence, and therapy design. However, the identification of PPI by wet lab experiment is a challenging task, since it is laborious, time consuming and expensive. Therefore, computational prediction of PPI is now given emphasis before going to the experimental validation, since it is simultaneously less laborious, time saver and cost minimizer. Objective: The objective of this study is to develop an improved computational method for PPI prediction mapping on Homo sapiens by using the amino acid sequence features in a supervised learning framework. Methods: The experimentally validated 91 positive-PPI pairs of human protein sequences were collected from IntAct Molecular Interaction Database. Then we constructed three balanced datasets with ratios 1:1, 1:2 and 1:3 of positive and negative PPI samples. Then we partitioned each dataset into training (80%) and independent test (20%) datasets. Again each training dataset was partitioned into four mutually exclusive groups of equal sizes for interchanging each group with independent test group to perform 5-fold cross validation (CV). Then we trained candidate seven classifiers (NN, SVM, LR, NB, KNN, AB and RF) with each ratio case to obtain the better PPI predictor by comparing their performance scores. Results: The random forest (RF) based predictor that was trained with 1:2 ratio of positive-PPI and negative-PPI samples based on AAC encoding features provided the most accurate PPI prediction by producing the highest average performance scores of accuracy (93.50%), sensitivity (95.0%), MCC (85.2%), AUC (0.941) and pAUC (0.236) with the 5-fold cross-validation. It also achieved the highest average performance scores of accuracy (92.0%), sensitivity (94.0%), MCC (83.6%), AUC (0.922) and pAUC (0.207) with the independent test datasets in a comparison of the other candidate and existing predictors. Conclusion: The final resultant prediction strongly recommend that the RF based predictor is a better prediction model of PPI mapping on Homo sapiens.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.