2019
DOI: 10.1007/s13748-019-00172-4
|View full text |Cite
|
Sign up to set email alerts
|

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

Abstract: A natural way of handling imbalanced data is to attempt to equalise the class frequencies and train the classifier of choice on balanced data. For two-class imbalanced problems, the classification success is typically measured by the geometric mean (GM) of the true positive and true negative rates. Here we prove that GM can be improved upon by instance selection, and give the theoretical conditions for such an improvement. We demonstrate that GM is non-monotonic with respect to the number of retained instances… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 59 publications
(22 citation statements)
references
References 44 publications
0
22
0
Order By: Relevance
“…The experiments in the first block were performed on artificial data sets taken from the paper by Napierala et al (2010) because using synthetic data allows us to know their characteristics a priori and analyze the effects of resampling in a fully controlled environment. The second group of experiments was on a well-known benchmark suite of real-life databases widely used for class imbalance problems (Chen et al, 2019;Jing et al, 2019;Kovács, 2019;Kuncheva et al, 2019;Lopez-Garcia et al, 2019), which are all available at the KEEL database repository (Alcalá-Fdez et al, 2011). The results of both experiments were estimated by 5-fold stratified cross-validation in order to have a sufficient amount of positive examples in the test partitions.…”
Section: Methodsmentioning
confidence: 99%
“…The experiments in the first block were performed on artificial data sets taken from the paper by Napierala et al (2010) because using synthetic data allows us to know their characteristics a priori and analyze the effects of resampling in a fully controlled environment. The second group of experiments was on a well-known benchmark suite of real-life databases widely used for class imbalance problems (Chen et al, 2019;Jing et al, 2019;Kovács, 2019;Kuncheva et al, 2019;Lopez-Garcia et al, 2019), which are all available at the KEEL database repository (Alcalá-Fdez et al, 2011). The results of both experiments were estimated by 5-fold stratified cross-validation in order to have a sufficient amount of positive examples in the test partitions.…”
Section: Methodsmentioning
confidence: 99%
“…For a binary classification problem, the classification performance is typically measured by the geometric mean (G-Mean) of the true-positive and the true-negative rates [ 35 ]. G-Mean is a measure for imbalanced classification that can be optimized to achieve a balance between sensitivity and specificity.…”
Section: Covid-19 Detection Analysismentioning
confidence: 99%
“…In the last step of the above procedure, the ACC, SEN, SPE, and GM [ 33 ] are defined by where , , and ‘TP’ and ‘TN’ are short for ‘True Positive’ and ‘True Negative’, respectively.…”
Section: Numerical Resultsmentioning
confidence: 99%