2013
DOI: 10.1007/978-3-642-28699-5_11
|View full text |Cite
|
Sign up to set email alerts
|

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

Abstract: This paper deals with inducing classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). The minority class is usually of primary interest and it is required to recognize its members as accurately as possible. Class imbalance constitutes a difficulty for most algorithms learning classifiers as they are biased toward the majority classes. The first part of this study is devoted to discussing main properties of data that c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
57
0
2

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 71 publications
(62 citation statements)
references
References 50 publications
3
57
0
2
Order By: Relevance
“…The degradation of classification performance is linked to other factors related to data distribution, such as the decomposition of the minority class into many rare sub-concepts playing a role of small disjuncts (Jo and Japkowicz 2004), the effect of too strong overlapping between the classes (Garcia et al 2007) or a presence of too many minority examples inside the majority class regions (Napierala and Stefanowski 2012). It has been shown that when these data difficulty factors occur together with class imbalance, they seriously hinder the recognition of the minority class Napierala et al 2010;Napierala and Stefanowski 2012;Stefanowski 2013Stefanowski , 2016a. In the experimental analysis of Roughly Balanced Bagging (see Section 3) we will refer to some difficulty factors by analysing types of unsafe examples in the distribution of the minority class following the methodology presented in Stefanowski (2012, 2016).…”
Section: Preliminariesmentioning
confidence: 99%
“…The degradation of classification performance is linked to other factors related to data distribution, such as the decomposition of the minority class into many rare sub-concepts playing a role of small disjuncts (Jo and Japkowicz 2004), the effect of too strong overlapping between the classes (Garcia et al 2007) or a presence of too many minority examples inside the majority class regions (Napierala and Stefanowski 2012). It has been shown that when these data difficulty factors occur together with class imbalance, they seriously hinder the recognition of the minority class Napierala et al 2010;Napierala and Stefanowski 2012;Stefanowski 2013Stefanowski , 2016a. In the experimental analysis of Roughly Balanced Bagging (see Section 3) we will refer to some difficulty factors by analysing types of unsafe examples in the distribution of the minority class following the methodology presented in Stefanowski (2012, 2016).…”
Section: Preliminariesmentioning
confidence: 99%
“…Different kinds of minority class examples may have a different influence on learning classifiers [62,63]. To enrich the performed analysis, we have further divided the datasets into three subgroups, dense, medium and sparse, which represent different degrees of difficulty to recognize minority elements.…”
Section: Datasetsmentioning
confidence: 99%
“…Inspired by [62], we use the local neighborhood of minority elements to consider them as safe, borderline, rare or outliers. In this work, we propose an alternative definition that is more conservative than the one used in [62].…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…Over the last years, however, research on this topic has also put the emphasis on studying the effect of imbalance together with other data complexity characteristics such as overlapping, small disjuncts and noisy data (He et al, 2015;López et al, 2013;Napierala et al, 2010;Prati et al, 2004;Stefanowski, 2013). Another critical subject that has attracted increasing interest in the scientific community is how to assess the performance of a classification model in the presence of imbalanced data sets because most common metrics (e.g., accuracy and error rates) strongly depend on the class distribution and assume equal misclassification costs, which may lead to distorted conclusions (He and Garcia, 2009;Menardi and Torelli, 2014).…”
Section: Introductionmentioning
confidence: 99%