2017
DOI: 10.11591/ijece.v7i4.pp2215-2222
|View full text |Cite
|
Sign up to set email alerts
|

Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of Classifier Performance

Abstract: Public health care systems routinely collect health-related data from the population. This data can be analyzed using data mining techniques to find novel, interesting patterns, which could help formulate effective public health policies and interventions. The occurrence of chronic illness is rare in the population and the effect of this class imbalance, on the performance of various classifiers was studied. The objective of this work is to identify the best classifiers for class imbalanced health datasets thr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 15 publications
0
13
0
1
Order By: Relevance
“…The disadvantage of this technique is that it may take longer training time and result in over-fitting since there is a significant increase in the size of the training set. A well-known oversampling technique known as Synthetic Minority Oversampling Technique (SMOTE), is used to oversample the minority class by creating synthetic instances to replicate the minority classes and increase their number of instances in the training set [31]. These synthetic instances are produced by considering two key parameters which are the number of instances (n) and the nearest neighbors (k).…”
Section: Class Balancingmentioning
confidence: 99%
See 1 more Smart Citation
“…The disadvantage of this technique is that it may take longer training time and result in over-fitting since there is a significant increase in the size of the training set. A well-known oversampling technique known as Synthetic Minority Oversampling Technique (SMOTE), is used to oversample the minority class by creating synthetic instances to replicate the minority classes and increase their number of instances in the training set [31]. These synthetic instances are produced by considering two key parameters which are the number of instances (n) and the nearest neighbors (k).…”
Section: Class Balancingmentioning
confidence: 99%
“…The Bayesian Network is preferred as past academic works have shown that this classifier exhibits a strong correlation among the attributes in the patient disease diagnosis. Other than that, the classifier is robust to unrelated variables, noise and confounding factors that are not part of the classification [31]. Bayesian Network has been broadly employed in many medical diagnoses based on previous literature studies, www.ijacsa.thesai.org especially for cancer prediction and recently the use of Bayesian Network classifiers in breast cancer prediction is trending.…”
Section: ) Bayesian Networkmentioning
confidence: 99%
“…ROC graphs are bi-dimensional graphs where on the Y axis t p rate is plotted and on the X axis f p rate is plotted. A ROC graph describe relative trade-offs between benefits (true positives) and costs (false positives) [23].…”
Section: Gaussian Naïve Bayesmentioning
confidence: 99%
“…The schemes are utilized to select the best subset of aspects. In [18] studied to detect the best classifiers for class imbalanced health datasets through a price depended comparison of classifier performance. The uneven misclassification prices were characterized in a cost matrix, and cost-benefit.…”
Section: Introductionmentioning
confidence: 99%