2018
DOI: 10.24215/16666038.18.e23
|View full text |Cite
|
Sign up to set email alerts
|

SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data

Abstract: The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 25 publications
(10 citation statements)
references
References 11 publications
0
9
0
1
Order By: Relevance
“…Regarding the SMOTE algorithm, in [16] a global SMOTE fully scalable solu tion was described, called SMOTE-BD. In order to cope with the potential data partitioning problems, the whole neighborhood of each minority class instance is taken into account.…”
Section: Imbalanced Classification In Big Datamentioning
confidence: 99%
“…Regarding the SMOTE algorithm, in [16] a global SMOTE fully scalable solu tion was described, called SMOTE-BD. In order to cope with the potential data partitioning problems, the whole neighborhood of each minority class instance is taken into account.…”
Section: Imbalanced Classification In Big Datamentioning
confidence: 99%
“…Their work was compared with various oversampling techniques on imbalanced low-and high-dimensional datasets, achieving a promising result to guarantee performance in constructing NLP application. Later, Maria et al [21] proposed a SMOTE-BD method to tackle the problem of imbalanced classification in big data. Their proposed scalable approach for imbalanced classification in big data is constructed on the basis of SMOTE algorithm, which helps create new synthetic instances according to the neighborhood of minority class sample.…”
Section: Smote Methodsmentioning
confidence: 99%
“…The scalability constraints regarding the volume of data adjacent to Big Data (BD) security appliances, the inherent complexity of data centers work flows [36] and the properties of nonstructured information [37] are attainable by implementing appropriate preprocessing stages, especially if SL algorithms only consider overall accuracy without taking into account relative class distribution. Random Oversampling for Big Data (ROS-Big Data), Random Undersampling for Big Data (RUS-BigData), and Map Reduce (MR) are some methods responsible for resampling extensive concentrations of evenly distributed data.…”
Section: Related Workmentioning
confidence: 99%
“…Random Oversampling for Big Data (ROS-Big Data), Random Undersampling for Big Data (RUS-BigData), and Map Reduce (MR) are some methods responsible for resampling extensive concentrations of evenly distributed data. As the authors described in [37], such techniques were applied by unifying a SMOTE variation for BD, obtaining, at its best, a favorable number of synthetic samples, avoiding some overgeneralization shortcomings to which SL are susceptible when handling a vast number of observations. Analogously, in [38], SMOTE was optimized by three major adjustments:…”
Section: Related Workmentioning
confidence: 99%