2013
DOI: 10.7763/ijmlc.2013.v3.305
|View full text |Cite
|
Sign up to set email alerts
|

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Abstract: Abstract-Imbalance in data classification is a frequently discussed problem that is not well handled by classical classification techniques. The problem we tackled was to learn binary classification model from large data with accuracy constraint for the minority class. We propose a new meta-learning method that creates initial models using cost-sensitive learning by logistic regression and uses these models as initial chromosomes for genetic algorithm. The method has been successfully tested on a large real-wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…This involves either oversampling instances of the minority class or undersampling instances of the majority class. Oversampling involves the random duplication of instances from minority classes [15][16][17]. Undersampling involves the random removal of instances from majority classes.…”
Section: Data-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This involves either oversampling instances of the minority class or undersampling instances of the majority class. Oversampling involves the random duplication of instances from minority classes [15][16][17]. Undersampling involves the random removal of instances from majority classes.…”
Section: Data-based Methodsmentioning
confidence: 99%
“…Another strategy is the threshold-moving technique in which the decision threshold is shifted in a manner that reduces bias towards the negative class [15][16][17]26]. It applies to classifiers that, given an input tuple, return a continuous output value.…”
Section: Algorithm-based Methodsmentioning
confidence: 99%
“…This often results in poorly estimated independent variable coefficients. One way to compensate is to under sample the majority class to rebalance the overall sample (Hostla et al, 2013). To maximise the number of observed data points used in the logistic regressions, we used all data points in the 'teams' category and randomly selected an equal number of data points in the 'no teams' category.…”
Section: Methodsmentioning
confidence: 99%
“…Many methods have been presented to deal with the class imbalanced problem using various techniques [10], [11]. The idea of developing the algorithm to build the decision tree classifier that is suitable for classifying an imbalanced dataset is one of the methods that have received wide attention.…”
Section: Introductionmentioning
confidence: 99%