2021
DOI: 10.1016/j.asoc.2021.107219
|View full text |Cite
|
Sign up to set email alerts
|

How to design the fair experimental classifier evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
2

Relationship

2
8

Authors

Journals

citations
Cited by 62 publications
(24 citation statements)
references
References 29 publications
0
24
0
Order By: Relevance
“…In order to assess whether DeepSMOTE returns statistically significantly better results than the reference resampling algorithms, we use the Friedman test with Shaffer post-hoc test [100] and the Bayesian Wilcoxon signed-rank test [101] for statistical comparison over multiple datasets. Both tests used a statistical significance level of 0.05.…”
Section: ) Statistical Analysis Of Resultsmentioning
confidence: 99%
“…In order to assess whether DeepSMOTE returns statistically significantly better results than the reference resampling algorithms, we use the Friedman test with Shaffer post-hoc test [100] and the Bayesian Wilcoxon signed-rank test [101] for statistical comparison over multiple datasets. Both tests used a statistical significance level of 0.05.…”
Section: ) Statistical Analysis Of Resultsmentioning
confidence: 99%
“…To further verify the superiority of HMCBCG over bagging, the experimental results of KNE, KNU and DESKNN under different k max values are subjected to paired t-tests with bagging, respectively. The paired t-test is recommended for the comparison of two classifiers on one dataset [ 53 , 54 ]. A p-value less than 0.05 is considered statistically significant in this study.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Empirical evidence proves that accuracy is strongly biased to favor the majority class and might produce misleading conclusions. This fact motivated a search for new balanced measures obtaining a trade-off between positive and negative class performances [16]. Examples of such metrics are the arithmetic (eq.5), geometric (eq.6 or eq.7) or harmonic means (eq.9) between the two components: recall and precision (or specificity).…”
Section: A Imbalanced Data Classification 1) Metricsmentioning
confidence: 99%