2015 IEEE International Advance Computing Conference (IACC) 2015
DOI: 10.1109/iadcc.2015.7154739
|View full text |Cite
|
Sign up to set email alerts
|

Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest

Abstract: In the era of big data, the applications generating tremendous amount of data are becoming the main focus of attention as the wide increment of data generation and storage that has taken place in the last few years. This scenario is challenging for data mining techniques which are not arrogated to the new space and time requirements. In many of the real world applications, classification of imbalanced data-sets is the point of attraction. Most of the classification methods focused on two-class imbalanced probl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 74 publications
(29 citation statements)
references
References 12 publications
0
28
0
1
Order By: Relevance
“…Finally, a preliminary study regarding multi-class imbalanced classification was introduced in [48]. This methodology consisted of two steps.…”
Section: Multi-class Imbalancementioning
confidence: 99%
See 1 more Smart Citation
“…Finally, a preliminary study regarding multi-class imbalanced classification was introduced in [48]. This methodology consisted of two steps.…”
Section: Multi-class Imbalancementioning
confidence: 99%
“…Specifically, "Data pre-processing studies" section contains the description for those techniques Data pre-processing [41][42][43][44][45][46][47][48] Cost-sensitive learning [41,[49][50][51] Applications on imbalanced Big Data [52][53][54] related to data pre-processing. "Cost-sensitive learning studies" section those approaches that carry out an algorithmic modification by means of a cost-sensitive learning.…”
Section: Addressing Imbalanced Classification In Big Data Problems: Cmentioning
confidence: 99%
“…Specifically, we used the LIBSVM tool [6] to build the SVM Model, and we used the Spark ML package 5 to build the random forest model. As data sampling has been shown in [3,7] to improve the random forest classification performance, for the training of our random forest model, we oversampled each class by duplicating the examples to match the class with the greatest number of examples. This reduces the imbalance due to the distribution of the employers within different industries.…”
Section: Learning Algorithmsmentioning
confidence: 99%
“…In our CIA system we used to consider an analysis on various strategies. Binarization technique aviles various categorization techniques [22]. Learning techniques ensures optimized future extraction.…”
Section: IImentioning
confidence: 99%