2013
DOI: 10.1016/j.ins.2013.07.007
|View full text |Cite
|
Sign up to set email alerts
|

An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

10
828
0
35

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 1,358 publications
(873 citation statements)
references
References 109 publications
10
828
0
35
Order By: Relevance
“…EARNING from imbalanced data is a challenging task that has gained attention over the last few years [28], [35], [41]. In contrast with traditional classification, it deals with datasets where one or more classes are underrepresented.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…EARNING from imbalanced data is a challenging task that has gained attention over the last few years [28], [35], [41]. In contrast with traditional classification, it deals with datasets where one or more classes are underrepresented.…”
mentioning
confidence: 99%
“…This happens because the classifier focuses on global measures that do not take into account the class data distribution [28], [35], [41]. Nevertheless the most interesting information is often found within the minority class.…”
mentioning
confidence: 99%
“…This could be a step forward in determining which intrinsic features of the data are a↵ecting the classifiers [2], and whether the performance of a classifier can be predicted based upon the available data [30]. However, note that, although the negative correlation between ID and the performance is expected to decrease as long as the class-imbalance techniques alleviate the hindering e↵ect of the class distribution, there might exist other hindering aspects [23] which may harm the performance of the classifiers.…”
Section: Resultsmentioning
confidence: 99%
“…Several works propose the use of synthetic data to improve datasets which suffer of imbalanced class distributions, including non-heuristic methods such as random undersampling or oversampling [20], and those that use some kind of interpolation for oversampling the training sets [21,22]. In our case, the imbalanced datasets were improved by random oversampling so that the experiments used actual log files (logs) generated by the firewall of an actual operating infrastructure in combination with synthetic registers generated through expert knowledge.…”
Section: Methodsologymentioning
confidence: 99%