2022
DOI: 10.1016/j.bdr.2022.100314
|View full text |Cite
|
Sign up to set email alerts
|

Downsampling for Binary Classification with a Highly Imbalanced Dataset Using Active Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 35 publications
(12 citation statements)
references
References 18 publications
0
11
0
1
Order By: Relevance
“…Such class imbalances increase the likelihood of developing ‘naïve’ learners featuring low false‐positive counts (type I error) but high false‐negative counts (type II error). To mitigate the effects of such a strongly zero‐inflated distribution and potentially decrease type II error, additional BRT and ANN models were generated using a downsampled version of the training dataset with the original number of positive cases, but with 50% of the negative cases randomly excluded (Barros et al, 2019; Lee & Seo, 2021), increasing the proportion of positive cases from 5.8% to 11.0% of complete observations (those without any missing values for any predictor variable).…”
Section: Methodsmentioning
confidence: 99%
“…Such class imbalances increase the likelihood of developing ‘naïve’ learners featuring low false‐positive counts (type I error) but high false‐negative counts (type II error). To mitigate the effects of such a strongly zero‐inflated distribution and potentially decrease type II error, additional BRT and ANN models were generated using a downsampled version of the training dataset with the original number of positive cases, but with 50% of the negative cases randomly excluded (Barros et al, 2019; Lee & Seo, 2021), increasing the proportion of positive cases from 5.8% to 11.0% of complete observations (those without any missing values for any predictor variable).…”
Section: Methodsmentioning
confidence: 99%
“…The metrics are Geometric Mean (GM) and Area Under the ROC Curve (AUC). GM is the result of calculating the geometric mean of sensitivity and specificity [18]. The plot that visualizes the balance achieved between the True Positive rate and the False Positive rate can be seen from the ROC curve [19].…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Sebagai catatan penting bahwa sampel informatif dapat berupa minoritas atau mayoritas, bukan hanya memilih sampel mayoritas. Hasil dari pengembangan algorima menunjukkan kinerja yang lebih baik dibandingkan dengan metode resampling lainnya dengan ukuran sampel yang lebih kecil (Lee & Seo, 2022).…”
Section: Pendahuluanunclassified