Data Preloading and Data Placement for MapReduce Performance Improving

Spivak, Anton; Nasonov, Denis

doi:10.1016/j.procs.2016.11.044

Cited by 12 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This algorithm reduces the delay of file transmission by file caching and transforms the optimization problem into a facility location problem. Spivak A et al [19] proposed an approach for the improvement of data placement. The algorithm considers the memory capacity, CPU number, and other attributes, and uses Hadoop Distributed File Systems (HDFS) cache to improve the task performance.…”

Section: Cache Placement To Reduce Latencymentioning

confidence: 99%

A cache placement algorithm based on comprehensive utility in big data multi-access edge computing

2021

KSII TIIS

View full text Add to dashboard Cite

The recent rapid growth of mobile network traffic places multi-access edge computing in an important position to reduce network load and improve network capacity and service quality. Contrasting with traditional mobile cloud computing, multi-access edge computing includes a base station cooperative cache layer and user cooperative cache layer. Selecting the most appropriate cache content according to actual needs and determining the most appropriate location to optimize the cache performance have emerged as serious issues in multi-access edge computing that must be solved urgently. For this reason, a cache placement algorithm based on comprehensive utility in big data multi-access edge computing (CPBCU) is proposed in this work. Firstly, the cache value generated by cache placement is calculated using the cache capacity, data popularity, and node replacement rate. Secondly, the cache placement problem is then modeled according to the cache value, data object acquisition, and replacement cost. The cache placement model is then transformed into a combinatorial optimization problem and the cache objects are placed on the appropriate data nodes using tabu search algorithm. Finally, to verify the feasibility and effectiveness of the algorithm, a multi-access edge computing experimental environment is built. Experimental results show that CPBCU provides a significant improvement in cache service rate, data response time, and replacement number compared with other cache placement algorithms.

show abstract

Section: Cache Placement To Reduce Latencymentioning

confidence: 99%

A cache placement algorithm based on comprehensive utility in big data multi-access edge computing

2021

KSII TIIS

View full text Add to dashboard Cite

show abstract

“…Ko et al [16] addressed the overhead of I/O data block processing in virtualized Hadoop clusters and propose a large segment scheme for I/O ring (the structure between frontend and backend driver for transferring I/O requests) to improve the performance perceived by applications. Spivak and Nasonov [23] suggested a distributed cache to preload data before its computation in Hadoop systems. They demonstrated 1 https://github.com/dohona/hadoop that their method effectively reduces the execution time of MapReduce jobs, particularly when the operation time of I/O data is lower than the intensive phase of the CPU.…”

Section: Related Workmentioning

confidence: 99%

Provisioning Input and Output Data Rates in Data Processing Frameworks

Farkas³

et al. 2020

J Grid Computing

View full text Add to dashboard Cite

This paper is motivated by the need of deadline-bounded applications in live mobile network environments to obtain the guarantee and the appropriate share of an input and output (I/O) data rate. However, data processing frameworks only support the request of memory and the computing capacity at present. In this paper, we propose a solution that allows the control of disk I/O and network I/O for data processing applications in YARN and Mesos frameworks. Experimental results show that our tool can provision the I/O data rate sharing of competing data processing applications.Keywords I/O enforcement · I/O data rate control · Cluster resource management · Hadoop YARN · Apache Mesos · HDFS T. V. Do ( )

show abstract

“…This shifts consumption of resources i.e., computing capacity and memory usage from the level of the central grid to individual edge nodes and effectively performs data analytics in smart grid [12]. However, a network of the grid pays trade-off to this benefit and consumes huge bandwidth in transporting enormous size datasets for aggregate MapReduce processing [13]. Moreover, aggregate function consumes an operational latency Latency n =Network i (Pathdistance/(Processing Time)) in receiving data blocks through multi-homing environment [14].…”

Section: Introductionmentioning

confidence: 99%