2018
DOI: 10.1080/24751839.2018.1440454
|View full text |Cite
|
Sign up to set email alerts
|

Imbalanced data classification using MapReduce and relief

Abstract: Classification of imbalanced data has been reported to require modification of standard classification algorithms and lately has attracted a lot of attention due to practical applications in industry, banking and finance. The aim of the paper is to examine algorithms known from literature when two modifications are introduced: MapReduce to parallelize computations and Relief to select most valuable attributes. Both modifications are needed in Big Data area. Also two new algorithms are considered.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 15 publications
0
7
0
1
Order By: Relevance
“…bags of not clearly labeled instances [120, 121], dealing with non-monotonic relationships [9, 31], dealing with survival data (i.e. data exploring the duration of time until one or more events happen) [6], dealing with imbalanced data [88, 49], clustering [24], and feature extraction [105].…”
Section: A Review Of Relief-based Algorithmsmentioning
confidence: 99%
“…bags of not clearly labeled instances [120, 121], dealing with non-monotonic relationships [9, 31], dealing with survival data (i.e. data exploring the duration of time until one or more events happen) [6], dealing with imbalanced data [88, 49], clustering [24], and feature extraction [105].…”
Section: A Review Of Relief-based Algorithmsmentioning
confidence: 99%
“…However, the protein was not quantified for this cohort. The imbalanced distribution of individuals without kidney dysfunction in this group of SCD patients likely affects the performance of the different regression models ( Jedrzejowicz et al, 2018 ), tending to be biased toward the normal ranges ( KrishnaVeni and Sobha, 2011 ) and potentially failing to identify possible signals.…”
Section: Limitationsmentioning
confidence: 99%
“…Yeast dataset (Hu et al, 2015) have 8 real attributes with 1,484 instances. Various kinds of datasets from the Keel dataset repository (Verbiest et al, 2012;Ahmed et al, 2019;Gong and Kim, 2017;Jedrzejowicz et al, 2018;Fernández et al, 2017;Wang, 2019) are mostly used in handling imbalanced datasets. Liver-Disorders-Bupa (Ebenuwa et al, 2019) contains 345 instances with 7 attributes where attribute types are Categorical, integer and real.…”
Section: Used Dataset In Researchesmentioning
confidence: 99%