2016
DOI: 10.1016/j.procs.2016.11.044
|View full text |Cite
|
Sign up to set email alerts
|

Data Preloading and Data Placement for MapReduce Performance Improving

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…This algorithm reduces the delay of file transmission by file caching and transforms the optimization problem into a facility location problem. Spivak A et al [19] proposed an approach for the improvement of data placement. The algorithm considers the memory capacity, CPU number, and other attributes, and uses Hadoop Distributed File Systems (HDFS) cache to improve the task performance.…”
Section: Cache Placement To Reduce Latencymentioning
confidence: 99%
“…This algorithm reduces the delay of file transmission by file caching and transforms the optimization problem into a facility location problem. Spivak A et al [19] proposed an approach for the improvement of data placement. The algorithm considers the memory capacity, CPU number, and other attributes, and uses Hadoop Distributed File Systems (HDFS) cache to improve the task performance.…”
Section: Cache Placement To Reduce Latencymentioning
confidence: 99%
“…Ko et al [16] addressed the overhead of I/O data block processing in virtualized Hadoop clusters and propose a large segment scheme for I/O ring (the structure between frontend and backend driver for transferring I/O requests) to improve the performance perceived by applications. Spivak and Nasonov [23] suggested a distributed cache to preload data before its computation in Hadoop systems. They demonstrated 1 https://github.com/dohona/hadoop that their method effectively reduces the execution time of MapReduce jobs, particularly when the operation time of I/O data is lower than the intensive phase of the CPU.…”
Section: Related Workmentioning
confidence: 99%
“…This shifts consumption of resources i.e., computing capacity and memory usage from the level of the central grid to individual edge nodes and effectively performs data analytics in smart grid [12]. However, a network of the grid pays trade-off to this benefit and consumes huge bandwidth in transporting enormous size datasets for aggregate MapReduce processing [13]. Moreover, aggregate function consumes an operational latency Latency n =Network i (Pathdistance/(Processing Time)) in receiving data blocks through multi-homing environment [14].…”
Section: Introductionmentioning
confidence: 99%