2015
DOI: 10.1007/s10844-015-0368-1
|View full text |Cite
|
Sign up to set email alerts
|

Types of minority class examples and their influence on learning classifiers from imbalanced data

Abstract: Many real-world applications reveal difficulties in learning classifiers from imbalanced data. Although several methods for improving classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the minority class distribution and their influence on classification performance. However, current studies on imbalanced data difficulty factors have been … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
200
1
2

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 236 publications
(211 citation statements)
references
References 40 publications
8
200
1
2
Order By: Relevance
“…For instance, if k = 5, the type of the example is assigned in the following way (Napierala and Stefanowski 2012;2016): 5:0 or 4:1 -an example is labeled as a safe example; 3:2 or 2:3 -a borderline example; 1:4 -labeled as a rare example; 0:5 -example is labeled as an outlier. This rule can be generalized for higher k values, however, results of recent experiments (Napierala and Stefanowski 2016) show that they lead to a similar categorization of considered datasets. Therefore, in the following study we stay with k = 5.…”
Section: Methodsmentioning
confidence: 96%
See 3 more Smart Citations
“…For instance, if k = 5, the type of the example is assigned in the following way (Napierala and Stefanowski 2012;2016): 5:0 or 4:1 -an example is labeled as a safe example; 3:2 or 2:3 -a borderline example; 1:4 -labeled as a rare example; 0:5 -example is labeled as an outlier. This rule can be generalized for higher k values, however, results of recent experiments (Napierala and Stefanowski 2016) show that they lead to a similar categorization of considered datasets. Therefore, in the following study we stay with k = 5.…”
Section: Methodsmentioning
confidence: 96%
“…Recall that different difficulty factors could be considered: a fragmentation of the minority class into small disjuncts, overlapping of decision boundaries, presence of rare cases, outliers, noise (Stefanowski 2016a). Here we follow the methodology from Napieraha and Stefanowski (2012Stefanowski ( , 2016, where most of these data difficulty factors can be modeled by distinguishing the following types of examples: safe examples (located in the homogeneous regions populated by examples from one class only); borderline (placed close to the decision boundary between classes); rare examples (isolated groups of few examples located deeper inside the opposite class), or outliers.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Also, even the more traditional learning tasks, such as training classifiers, are now reformulated in more demanding ways, for instance, by taking into account additional constraints or data properties, like unusual distributions of examples and/or imbalance of target classes (Fernández et al, 2017;Napierala and Stefanowski, 2016). Such enriched input to the induction process requires more advanced and complex algorithms.…”
Section: Types Of Complex and Big Datamentioning
confidence: 99%