Proceedings of the 2017 VI International Conference on Network, Communication and Computing 2017
DOI: 10.1145/3171592.3171610
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study of Data Skew in Hadoop

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 10 publications
0
3
0
Order By: Relevance
“…In such case, using HDFS blocks directly to estimate statistics and build models may lead to statistically incorrect or biased results. Another key issue is data skew, which describes the uneven distribution of the records leading to tasks with different execution times [67][68][69] . On computing clusters, the performance strongly depends on how evenly data are distributed among the nodes.…”
Section: Big Data Partitioning On Hadoop Clustersmentioning
confidence: 99%
“…In such case, using HDFS blocks directly to estimate statistics and build models may lead to statistically incorrect or biased results. Another key issue is data skew, which describes the uneven distribution of the records leading to tasks with different execution times [67][68][69] . On computing clusters, the performance strongly depends on how evenly data are distributed among the nodes.…”
Section: Big Data Partitioning On Hadoop Clustersmentioning
confidence: 99%
“…1. Calculate the average number of records assigned to each partition p avg according to estimated intermediate data tuples and Formula (9), and set the maximum amount of data to be allocated for each partition to p avg .…”
Section: Data Balanced Partition Algorithmmentioning
confidence: 99%
“…He. et al provided a comparative study on data skew in Hadoop in terms of architectures, main features, core algorithms, performance metrics, and evaluation methods, and summarized a few challenging problems as future research trends.…”
Section: Related Workmentioning
confidence: 99%