Proceedings of the 1st International Conference on Cloud Computing and Services Science 2011
DOI: 10.5220/0003391105740583
|View full text |Cite
|
Sign up to set email alerts
|

Handling Data Skew in Mapreduce

Abstract: Abstract:MapReduce systems have become popular for processing large data sets and are increasingly being used in e-science applications. In contrast to simple application scenarios like word count, e-science applications involve complex computations which pose new challenges to MapReduce systems. In particular, (a) the runtime complexity of the reducer task is typically high, and (b) scientific data is often skewed. This leads to highly varying execution times for the reducers. Varying execution times result i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…Various of research efforts have been conducted on the partition of reduce tasks. To facilitate reduce partition processes, historical data [12] and sampling results [13] have gained extensive attention. Though load balancing can be achieved dynamically according to these methods, none of them were verified in a real Hadoop system.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Various of research efforts have been conducted on the partition of reduce tasks. To facilitate reduce partition processes, historical data [12] and sampling results [13] have gained extensive attention. Though load balancing can be achieved dynamically according to these methods, none of them were verified in a real Hadoop system.…”
Section: Related Workmentioning
confidence: 99%
“…While theoretically infinite computing resources can be provided in a cloud, the unreasonable increment of mappers/reducers cannot achieve process efficiency, and may waste more storage to complete. Many optimization schemes have been proposed [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25].…”
Section: Introductionmentioning
confidence: 99%