Intelligent Data Compression Policy for Hadoop Performance Optimization

Abdul, Ashu; Hussain, Mir Wajahat; Roy, Diptendu Sinha; Reddy, K. Hemant Kumar

doi:10.1007/978-3-030-49345-5_9

Cited by 11 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This may lead to a potential network bottleneck, since the reduce phase may require a lot of shuffled data from the mappers, which may be executed across different nodes in various racks. This may incur severe performance penalty due to run time data transfers [10]. It is obvious in a heterogeneous scenario that some nodes may complete more maps than others; the extent of difference varying depending upon the inherent heterogeneity of the Hadoop cluster.…”

Section: Introductionmentioning

confidence: 99%

“…Majority of research attempts towards Hadoop's performance improvements have focused on optimizing performance of the map phase through effective data placements preceding map operations [9,[11][12][13][14], primarily by means of some mechanism of grouping related data block and colocating them within the same cluster node. However, for many applications the amount of intermediate data after a map phase is huge [10]. Trivial implementation of copy | shuffle phase followed by arbitrary reducer decisioning in terms of number of reducers a selecting nodes as reducers are often found inadequate.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A counter based approach for reducer placement with augmented Hadoop rack awareness

Hussain¹,

Reddy²,

Roy³

2021

Turk J Elec Eng & Comp Sci

Self Cite

View full text Add to dashboard Cite

As the data-driven paradigm for intelligent systems design is gaining prominence, performance requirements have become very stringent, leading to numerous fine-tuned versions of Hadoop and its MapReduce programming model. However, very few researchers have investigated the effect of intelligent reducer placement on Hadoop's performance. This paper delves into this much ignored reducer placement phase for improving Hadoop's performance and proposes to spawn reduce phase of Hadoop tasks in an asynchronous fashion across nodes in a Hadoop cluster. The main contributions of this paper are: (i) to track when map phase of tasks are completed, (ii) to count the number of maps completed, and finally (iii) assign reducers to Hadoop nodes based on map counts such that run-time data copying is minimized. To this end, this paper presents a novel counter based reducer placement (CBRP) algorithm based on the counter values maintained by JobTracker at the rack and node levels. Experiments conducted demonstrate the merit of the proposed reducer placement with average improvements ranging between 5% and 17% experienced across different benchmarks with both late shuffle and early shuffle.

show abstract

Section: Introductionmentioning

confidence: 99%