Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Ardagna, Danilo; Bernardi, Simona; Gianniti, Eugenio; Karimian-Aliabadi, Soroush; Pérez-Palacín, Diego; Requeno, José Ignacio

doi:10.1007/978-3-319-49583-5_47

Cited by 27 publications

(37 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…When compared with the literature, JMT or GreatSPN for the same models can take up to one hour without obtaining greater accuracy (see [5] for additional details). On the other hand, on equivalent scenarios, the Task Precedence model performed quite well in terms of model solving time (always around one second).…”

Section: Summary Of Resultsmentioning

confidence: 99%

“…Finally, the authors in [4] describe multiple queuing network models (simulated with JMT) and stochastic well formed nets (simulated with GreatSPN) to model MapReduce applications, highlighting the tradeoffs and additional complexity required to capture system behavior to improve prediction accuracy. As a result, general purpose simulators such as GreatSPN and JMT are not suitable to study efficiently massively parallel applications introducing tens (or even hundreds) of stages and thousands of parallel tasks for each stage.…”

Section: Simulation Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Performance Prediction of Cloud-Based Big Data Applications

Ardagna

Barbierato

Evangelinou

et al. 2018

Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering

Self Cite

View full text Add to dashboard Cite

Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, the heterogeneity and irregularity usually associated with big data applications often overwhelm the existing software and hardware infrastructures. In such context, the flexibility and elasticity provided by the cloud computing paradigm offer a natural approach to cost-effectively adapting the allocated resources to the application's current needs. However, these same characteristics impose extra challenges to predicting the performance of cloud-based big data applications, a key step to proper management and planning. This paper explores three modeling approaches for performance prediction of cloud-based big data applications. We evaluate two queuing-based analytical models and a novel fast ad hoc simulator in various scenarios based on different applications and infrastructure setups. The three approaches are compared in terms of prediction accuracy, finding that our best approaches can predict average application execution times with 26% relative error in the very worst case and about 7% on average.

show abstract

Section: Summary Of Resultsmentioning

confidence: 99%

Section: Simulation Approachesmentioning

confidence: 99%

Performance Prediction of Cloud-Based Big Data Applications

Ardagna

Barbierato

Evangelinou

et al. 2018

Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our tool is a distributed software system designed to exploit multi-core and multi-host architectures to work at a high degree of parallelism. In particular, it features a presentation layer (integrated in the IDE) devoted to manage the interactions with users and with other components of the DICE ecosystem, an optimization service (colored gray), which transforms the inputs into suitable performance models [18] and implements the optimization strategy, and a horizontally scalable assessment service (colored green in the picture), which abstracts the performance evaluation from the particular solver used. Currently, a QN simulator (JMT [25]), a SPN simulator (GreatSPN [26]), and a discrete event simulator (dagSim [19]) are supported.…”

Section: D-space4cloud Architecturementioning

confidence: 99%

“…The underlying optimization problem is NP-hard and is tackled by a simulation-optimization procedure able to determine an optimized configuration for a cluster managed by the YARN Capacity Scheduler [17]. DIA execution times are estimated by relying on multiple models, including machine learning (ML) and simulation based on queueing networks (QNs), stochastic Petri nets (SPNs) [18], as well as an ad hoc discrete event simulator, dagSim [19], especially designed for the analysis of applications involving a number of stages linked by directed acyclic graphs (DAGs) of precedence constraints. This property is common to legacy MapReduce jobs, workloads based on Apache Tez, and Spark-based applications.…”

Section: Introductionmentioning

confidence: 99%

Optimizing Quality-Aware Big Data Applications in the Cloud

Gianniti

Ciavotta

Ardagna

2021

IEEE Trans. Cloud Comput.

Self Cite

View full text Add to dashboard Cite

The last years witnessed a steep rise in data generation worldwide and, consequently, the widespread adoption of software solutions able to support data-intensive applications. Competitiveness and innovation have strongly benefited from these new platforms and methodologies, and there is a great deal of interest around the new possibilities that Big Data analytics promise to make reality. Many companies currently engage in data-intensive processes as part of their core businesses; however, fully embracing the data-driven paradigm is still cumbersome, and establishing a production-ready, fine-tuned deployment is time-consuming, expensive, and resource-intensive. This situation calls for innovative models and techniques to streamline the process of deployment configuration for Big Data applications. In particular, the focus in this paper is on the rightsizing of Cloud deployed clusters, which represent a cost-effective alternative to installation on premises. This paper proposes a novel tool, integrated in a wider DevOps-inspired approach, implementing a parallel and distributed simulation-optimization technique that efficiently and effectively explores the space of alternative Cloud configurations, seeking the minimum cost deployment that satisfies quality of service constraints. The soundness of the proposed solution has been thoroughly validated in a vast experimental campaign encompassing different applications and Big Data platforms.

show abstract

“…Although MapReduce job performance metrics can be evaluated, for example, by relying on simulations [14,27], there is a fundamental trade-off between the accuracy of the models and the time required to run them. Given the need to compute capacity allocation at scale (Hadoop clusters nowadays run thousands of jobs a day [48]), the high complexity of simulating even small-scale instances of MapReduce jobs has prevented us from exploiting such results here.…”

Section: Approximate Formulae For Mapreduce Execution Timementioning

confidence: 99%

An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

et al. 2018

View full text Add to dashboard Cite

It is safe to claim that we live in a Big Data world. In effect, many sectors of our economy are already guided by data-driven decision processes. Big Data and Business Intelligence applications are facilitated by the MapReduce programming model while, at infrastructural layer, cloud computing provides flexible and cost effective solutions to provide on demand large clusters. Capacity allocation in such systems, meant as the problem of providing computational power to support concurrent MapReduce applications in a cost effective fashion, represents a challenge of paramount importance. In this paper we lay the foundation for a solution implementing admission control and capacity allocation for MapReduce jobs with a priori deadline guarantees. In particular, shared Hadoop 2.x clusters supporting batch and/or interactive jobs are targeted. We formulate a linear programming model able The simulations and numerical analyses have been performed under the Windows Azure

show abstract

Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Cited by 27 publications

References 30 publications

Performance Prediction of Cloud-Based Big Data Applications

Performance Prediction of Cloud-Based Big Data Applications

Optimizing Quality-Aware Big Data Applications in the Cloud

An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

Contact Info

Product

Resources

About