Proceedings of the 9th International Conference on Autonomic Computing 2012
DOI: 10.1145/2371536.2371546
|View full text |Cite
|
Sign up to set email alerts
|

Automated profiling and resource management of pig programs for meeting service level objectives

Abstract: An increasing number of MapReduce applications associated with live business intelligence require completion time guarantees. In this paper, we consider the popular Pig framework that provides a high-level SQL-like abstraction on top of MapReduce engine. Programs written in this framework are compiled into directed acyclic graphs (DAGs) of MapReduce jobs. There is a lack of performance models and analysis tools for automated performance management of such MapReduce jobs. We offer a performance modeling environ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 33 publications
(21 citation statements)
references
References 17 publications
0
21
0
Order By: Relevance
“…It then finds the minimum number of slots that are required to meet a running time constraint using Lagrange multipliers. Extensions have appeared in [18], where trade-off curves between running time and monetary cost are provided to the user, who makes the final choice, and in [19], where the number of map and reduce slots are optimally decided. All these techniques are specific to a MapReduce setting running on cloud machines and assume an analytical cost model that is even simpler than the one in [7], which is extended by our work.…”
Section: Related Workmentioning
confidence: 99%
“…It then finds the minimum number of slots that are required to meet a running time constraint using Lagrange multipliers. Extensions have appeared in [18], where trade-off curves between running time and monetary cost are provided to the user, who makes the final choice, and in [19], where the number of map and reduce slots are optimally decided. All these techniques are specific to a MapReduce setting running on cloud machines and assume an analytical cost model that is even simpler than the one in [7], which is extended by our work.…”
Section: Related Workmentioning
confidence: 99%
“…Enforcing high-level scheduling policies and fair sharing have been explored in the context of distributed storage systems [30,31,62,67,73,74]; however, they typically consider simpler execution structures (e.g., client to server) whereas Wisp focuses on a general DAG wherein individual processes lack end-to-end visibility. Lastly, several proposals exist for optimizing job completion times for DAGs of tasks in big-data systems [11,28,77,78]. However, data analytics jobs are often orders of magnitude longer than those serviced by the SOA systems targeted by Wisp (which operate under the additional constraint of limited end-to-end visibility).…”
Section: Related Workmentioning
confidence: 99%
“…A MapReduce performance model relying on a compact job profile definition to calculate a lower bound, an upper bound and an estimation of job execution time is presented. Finally, such model, improved in [32], is validated through a simulation study and an experimental campaign on a 66-nodes Hadoop cluster.…”
Section: Related Workmentioning
confidence: 99%