2020
DOI: 10.3390/s20102809
|View full text |Cite
|
Sign up to set email alerts
|

Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods

Abstract: Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs. The CCPM first removes outliers by using outlier detection methods such as density-based spatial clustering of applications with noise (DBSCAN) and isolation forest (iFo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
136
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 206 publications
(136 citation statements)
references
References 70 publications
0
136
0
Order By: Relevance
“…In Table 5, we compare the proposed approach with some recent scholarly works that used the cervical cancer dataset, including principal component analysis (PCA)-based SVM [33], a research work where the dataset was preprocessed and classified using numerous algorithms, in which LR and SVM had the best accuracy [34], a C5.0 decision tree [35]. The other methods include a multistage classification process which combined isolation forest (iForest), synthetic minority over-sampling technique (SMOTE), and RF [36], a sparse autoencoder feature learning method combined ANN classifier [12], and a feature selection method combined with C5.0 and RF [37]. [35] C5.0 96 Ijaz et al [36] iForest+SMOTE+RF 98.925 Mienye et al [12] SAE+ANN 98…”
Section: Methods Accuracy (%)mentioning
confidence: 99%
See 1 more Smart Citation
“…In Table 5, we compare the proposed approach with some recent scholarly works that used the cervical cancer dataset, including principal component analysis (PCA)-based SVM [33], a research work where the dataset was preprocessed and classified using numerous algorithms, in which LR and SVM had the best accuracy [34], a C5.0 decision tree [35]. The other methods include a multistage classification process which combined isolation forest (iForest), synthetic minority over-sampling technique (SMOTE), and RF [36], a sparse autoencoder feature learning method combined ANN classifier [12], and a feature selection method combined with C5.0 and RF [37]. [35] C5.0 96 Ijaz et al [36] iForest+SMOTE+RF 98.925 Mienye et al [12] SAE+ANN 98…”
Section: Methods Accuracy (%)mentioning
confidence: 99%
“…The other methods include a multistage classification process which combined isolation forest (iForest), synthetic minority over-sampling technique (SMOTE), and RF [36], a sparse autoencoder feature learning method combined ANN classifier [12], and a feature selection method combined with C5.0 and RF [37]. [35] C5.0 96 Ijaz et al [36] iForest+SMOTE+RF 98.925 Mienye et al [12] SAE+ANN 98…”
Section: Methods Accuracy (%)mentioning
confidence: 99%
“…In [ 14 ], the authors provide survey of unsupervised machine learning algorithms that are proposed for outlier detection. In [ 15 ], the authors propose a cervical cancer prediction model (CCPM) for early prediction of cervical cancer using risk factors as inputs. The authors utilize several machine learning approaches and outlier detection for different preprocessing tasks.…”
Section: Background and Related Workmentioning
confidence: 99%
“…It deals with the noisy instances in the majority class via a noise filter based on the DBSCAN clustering algorithm [30] combined with the use of a minimum spanning tree (MST) algorithm to reduce the size of the negative class. The reason to combine the DBSCAN clustering with the MST approach is because DBSCAN has demonstrated to be a powerful tool for identifying and removing noisy instances and cleaning the overlapping between classes [31], but it does not produce a well-balanced class distribution. By viewing the data set as a weighted complete graph, the MST algorithm allows for discovering the core of the majority class, which is further used to remove the amount of redundant negative instances needed to balance both classes.…”
Section: Introductionmentioning
confidence: 99%