Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

Stefanowski, Jerzy

doi:10.1007/978-3-642-28699-5_11

Cited by 71 publications

(62 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The degradation of classification performance is linked to other factors related to data distribution, such as the decomposition of the minority class into many rare sub-concepts playing a role of small disjuncts (Jo and Japkowicz 2004), the effect of too strong overlapping between the classes (Garcia et al 2007) or a presence of too many minority examples inside the majority class regions (Napierala and Stefanowski 2012). It has been shown that when these data difficulty factors occur together with class imbalance, they seriously hinder the recognition of the minority class Napierala et al 2010;Napierala and Stefanowski 2012;Stefanowski 2013Stefanowski , 2016a. In the experimental analysis of Roughly Balanced Bagging (see Section 3) we will refer to some difficulty factors by analysing types of unsafe examples in the distribution of the minority class following the methodology presented in Stefanowski (2012, 2016).…”

Section: Preliminariesmentioning

confidence: 99%

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Lango

Stefanowski

2017

J Intell Inf Syst

Self Cite

View full text Add to dashboard Cite

Roughly Balanced Bagging is one of the most efficient ensembles specialized for class imbalanced data. In this paper, we study its basic properties that may influence its good classification performance. We experimentally analyze them with respect to bootstrap construction, deciding on the number of component classifiers, their diversity, and ability to deal with the most difficult types of the minority examples. Then, we introduce two generalizations of this ensemble for dealing with a higher number of attributes and for adapting it to handle multiple minority classes. Experiments with synthetic and real life data confirm usefulness of both proposals.

show abstract

Section: Preliminariesmentioning

confidence: 99%

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Lango

Stefanowski

2017

J Intell Inf Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…Different kinds of minority class examples may have a different influence on learning classifiers [62,63]. To enrich the performed analysis, we have further divided the datasets into three subgroups, dense, medium and sparse, which represent different degrees of difficulty to recognize minority elements.…”

Section: Datasetsmentioning

confidence: 99%

“…Inspired by [62], we use the local neighborhood of minority elements to consider them as safe, borderline, rare or outliers. In this work, we propose an alternative definition that is more conservative than the one used in [62].…”

Section: Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

et al. 2016

View full text Add to dashboard Cite

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of realworld situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-the-art preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals.

show abstract

“…Over the last years, however, research on this topic has also put the emphasis on studying the effect of imbalance together with other data complexity characteristics such as overlapping, small disjuncts and noisy data (He et al, 2015;López et al, 2013;Napierala et al, 2010;Prati et al, 2004;Stefanowski, 2013). Another critical subject that has attracted increasing interest in the scientific community is how to assess the performance of a classification model in the presence of imbalanced data sets because most common metrics (e.g., accuracy and error rates) strongly depend on the class distribution and assume equal misclassification costs, which may lead to distorted conclusions (He and Garcia, 2009;Menardi and Torelli, 2014).…”

Section: Introductionmentioning

confidence: 99%

Associative learning on imbalanced environments: An empirical study

Cleofas-Sánchez

Sánchez

García

et al. 2016

Expert Systems with Applications

View full text Add to dashboard Cite

Associative memories have emerged as a powerful computational neural network model for several pattern classification problems. Like most traditional classifiers, these models assume that the classes share similar prior probabilities. However, in many real-life applications the ratios of prior probabilities between classes are extremely skewed. Although the literature has provided numerous studies that examine the performance degradation of renowned classifiers on different imbalanced scenarios, so far this effect has not been supported by a thorough empirical study in the context of associative memories. In this paper, we fix our attention on the applicability of the associative neural networks to the classification of imbalanced data. The key questions here addressed are whether these models perform better, the same or worse than other popular classifiers, how the level of imbalance affects their performance, and whether distinct resampling strategies produce a different impact on the associative memories. In order to answer these questions and gain further insight into the feasibility and efficiency of the associative memories, a large-scale experimental evaluation with 31 databases, seven classification models and four resampling algorithms is carried out here, along with a non-parametric statistical test to discover any significant differences between each pair of classifiers.

show abstract

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

Cited by 71 publications

References 50 publications

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

Associative learning on imbalanced environments: An empirical study

Contact Info

Product

Resources

About