2014
DOI: 10.1016/j.ins.2014.03.043
|View full text |Cite
|
Sign up to set email alerts
|

On the use of MapReduce for imbalanced big data using Random Forest

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
74
0
3

Year Published

2016
2016
2020
2020

Publication Types

Select...
9
1

Relationship

2
8

Authors

Journals

citations
Cited by 258 publications
(78 citation statements)
references
References 43 publications
1
74
0
3
Order By: Relevance
“…-SMOTE-based oversampling methods applied in distributed environments such as MapReduce tend to fail [13]. This can be caused by a random partitioning of data for each mapper and thus introducing artificial samples on the basis of real objects that have no spatial relationships.…”
Section: Imbalanced Big Datamentioning
confidence: 99%
“…-SMOTE-based oversampling methods applied in distributed environments such as MapReduce tend to fail [13]. This can be caused by a random partitioning of data for each mapper and thus introducing artificial samples on the basis of real objects that have no spatial relationships.…”
Section: Imbalanced Big Datamentioning
confidence: 99%
“…For this problem, two pre-processing algorithms were applied. First, the Random OverSampling (ROS) algorithm used in [18] was applied in order to replicate the minority class instances from the original dataset until the number of instances for both classes was equalized, summing a total of 65 millions instances. Finally, for DITFS algorithm, the dataset has been discretized using the Minimum Description Length Principle (MDLP) discretizer [19].…”
Section: Resultsmentioning
confidence: 99%
“…No reducer is also possible. For example, del Río et al [63] used only multiple mappers (no reducer) for network intrusion detection with rForest. Since various decision trees are generated from multiple mappers, they considered using all outcomes (i.e.…”
Section: Implementation Examplesmentioning
confidence: 99%