2016
DOI: 10.1109/access.2016.2619719
|View full text |Cite
|
Sign up to set email alerts
|

Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study

Abstract: Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers. However, the performances of predictive models are greatly affected when the real-world data set is highly imbalanced. A data set is called imbalanced if the samples size from one class is very much smaller or larger than the o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
116
0
6

Year Published

2018
2018
2022
2022

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 250 publications
(123 citation statements)
references
References 66 publications
1
116
0
6
Order By: Relevance
“…Significantly, this problem often appears in cytotoxic datasets, which makes it difficult for computers to learn fully the features of cytotoxic compounds. As a result, an error will occur when computers classify unknown data (Amin et al, 2016;Estabrooks, 2010;Krawczyk, 2016;Nanni, Fantozzi, & Lazzarini, 2015). This has caused great difficulties in computer-based prediction of toxicity, impeding many studies and preventing application to drug screening.…”
mentioning
confidence: 99%
“…Significantly, this problem often appears in cytotoxic datasets, which makes it difficult for computers to learn fully the features of cytotoxic compounds. As a result, an error will occur when computers classify unknown data (Amin et al, 2016;Estabrooks, 2010;Krawczyk, 2016;Nanni, Fantozzi, & Lazzarini, 2015). This has caused great difficulties in computer-based prediction of toxicity, impeding many studies and preventing application to drug screening.…”
mentioning
confidence: 99%
“…Data quality issues must be carefully considered since any problem with the data quality will seriously mar the performance of ML algorithms. A potential problem is that dataset may be imbalanced if the samples size from one class is very much smaller or larger than the other classes [321]. In such imbalanced datasets, the algorithm must be careful not to ignore the rare class by assuming it to be noise.…”
Section: F Data Quality Issuesmentioning
confidence: 99%
“…Similarly we have balanced out the classes in the Paris dataset. Although any other oversampling techniques could have been chosen and they are reported to have competitive performances [1] on various datasets, to avoid complexity we have chosen to apply SMOTE. The oversampled dataset was used to train classifiers and the result of such classifiers can be found in Table 3 and 4.…”
Section: Applying Oversampling Technique To Handle Overfittingmentioning
confidence: 99%