2020
DOI: 10.1186/s40537-020-00319-4
|View full text |Cite
|
Sign up to set email alerts
|

Estimating runtime of a job in Hadoop MapReduce

Abstract: Nowadays, with the emergence and use of new systems, we face a massive amount of data. Due to the volume, velocity, and variety of these big data, managing, maintaining, and processing them require special infrastructures. One of the famous open-source frameworks is Apache Hadoop [1]. It is a scalable and reliable framework for storage and process big data. Hadoop divides the big input data into fixed-size pieces; stores and processes these split of data on a cluster of machines. By default, each split copies … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…This work considers using ML oracle to predict job sizes. Recent works (Amiri and Mohammad-Khanli 2017;Peyravi and Moeini 2020;Yamashiro and Nonaka 2021) have shown that job sizes are highly predictable in many scenarios, e.g., cloud, clusters, and factories. In addition, when the prediction is accurate, we have the 2-relaxed decision procedure guaranteeing a near-optimal makespan, and when the prediction goes arbitrarily bad, the existing O(log m)-competitive algorithm can bound the performance.…”
Section: Oracle and Prediction Errormentioning
confidence: 99%
“…This work considers using ML oracle to predict job sizes. Recent works (Amiri and Mohammad-Khanli 2017;Peyravi and Moeini 2020;Yamashiro and Nonaka 2021) have shown that job sizes are highly predictable in many scenarios, e.g., cloud, clusters, and factories. In addition, when the prediction is accurate, we have the 2-relaxed decision procedure guaranteeing a near-optimal makespan, and when the prediction goes arbitrarily bad, the existing O(log m)-competitive algorithm can bound the performance.…”
Section: Oracle and Prediction Errormentioning
confidence: 99%
“…Zhu et al [16] proposed BestConfig, which uses the divide-anddiverge sampling method and the recursive-bound-and-search method for parameter tuning of general systems with resource constraints. Peyravi et al [17] reported the runtime efficiency of a MapReduce job by considering three categories of parameters that have higher impact on the runtime. They have modeled the runtime efficiency during each phase of the Hadoop execution pipeline using a weighing system based of job history.…”
Section: Related Workmentioning
confidence: 99%