2019
DOI: 10.1016/j.procs.2019.11.177
|View full text |Cite
|
Sign up to set email alerts
|

A Review on Data Cleansing Methods for Big Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
73
0
9

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 147 publications
(83 citation statements)
references
References 6 publications
1
73
0
9
Order By: Relevance
“…For tackling issue (b), namely data heterogeneity, it is crucial to identify appropriate data cleaning techniques; see, e.g., Rahm and Hai Do (2000) and Ridzuan et al (2019) for overviews of traditional techniques and extensions for handling big data, respectively. In general, the elimination of noise (of which redundancy is one of the most prominent aspects) is recognized as fundamental for obtaining high-quality datasets and, at the same time, is an interactive, error prone and time consuming activity.…”
Section: Discussionmentioning
confidence: 99%
“…For tackling issue (b), namely data heterogeneity, it is crucial to identify appropriate data cleaning techniques; see, e.g., Rahm and Hai Do (2000) and Ridzuan et al (2019) for overviews of traditional techniques and extensions for handling big data, respectively. In general, the elimination of noise (of which redundancy is one of the most prominent aspects) is recognized as fundamental for obtaining high-quality datasets and, at the same time, is an interactive, error prone and time consuming activity.…”
Section: Discussionmentioning
confidence: 99%
“…95 In the context of big data analysis, traditional 75 Figueiredo et al 20 + SVM, SVDD, KPCA, GKPCA ROC KPCA, GKPCA Figueiredo et al 20 AANN ROC -Figueiredo and Cross 76 NLPCA ROC -Oh et al 78 KPCA N/A -data cleaning working sequentially cannot easily be applied to ever-growing complicated structures. 96 Thus, the parallel execution of any method should be in line with the five big data quality standards discussed previously. The data cleaning process is always performed before initiating the subsequent ML processes.…”
Section: Data Cleaningmentioning
confidence: 93%
“…As of now, there are two classifications of data cleansing: traditional data cleansing and data cleansing for big data. Traditional data cleansing techniques are so called because it is not used to manage massive volumes of data, such as Potter's Wheel and Intelliclean [12].…”
Section: Data-cleaning Applicationsmentioning
confidence: 99%