2016
DOI: 10.1007/978-3-319-49583-5_47
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Performance of Hadoop Applications: A Journey from Queueing Networks to Stochastic Well Formed Nets

Abstract: Nowadays, many enterprises commit to the extraction of actionable knowledge from huge datasets as part of their core business activities. Applications belong to very different domains such as fraud detection or one-to-one marketing, and encompass business analytics and support to decision making in both private and public sectors. In these scenarios, a central place is held by the MapReduce framework and in particular its open source implementation, Apache Hadoop. In such environments, new challenges arise in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
34
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 27 publications
(37 citation statements)
references
References 30 publications
1
34
0
2
Order By: Relevance
“…When compared with the literature, JMT or GreatSPN for the same models can take up to one hour without obtaining greater accuracy (see [5] for additional details). On the other hand, on equivalent scenarios, the Task Precedence model performed quite well in terms of model solving time (always around one second).…”
Section: Summary Of Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…When compared with the literature, JMT or GreatSPN for the same models can take up to one hour without obtaining greater accuracy (see [5] for additional details). On the other hand, on equivalent scenarios, the Task Precedence model performed quite well in terms of model solving time (always around one second).…”
Section: Summary Of Resultsmentioning
confidence: 99%
“…Finally, the authors in [4] describe multiple queuing network models (simulated with JMT) and stochastic well formed nets (simulated with GreatSPN) to model MapReduce applications, highlighting the tradeoffs and additional complexity required to capture system behavior to improve prediction accuracy. As a result, general purpose simulators such as GreatSPN and JMT are not suitable to study efficiently massively parallel applications introducing tens (or even hundreds) of stages and thousands of parallel tasks for each stage.…”
Section: Simulation Approachesmentioning
confidence: 99%
“…Our tool is a distributed software system designed to exploit multi-core and multi-host architectures to work at a high degree of parallelism. In particular, it features a presentation layer (integrated in the IDE) devoted to manage the interactions with users and with other components of the DICE ecosystem, an optimization service (colored gray), which transforms the inputs into suitable performance models [18] and implements the optimization strategy, and a horizontally scalable assessment service (colored green in the picture), which abstracts the performance evaluation from the particular solver used. Currently, a QN simulator (JMT [25]), a SPN simulator (GreatSPN [26]), and a discrete event simulator (dagSim [19]) are supported.…”
Section: D-space4cloud Architecturementioning
confidence: 99%
“…The underlying optimization problem is NP-hard and is tackled by a simulation-optimization procedure able to determine an optimized configuration for a cluster managed by the YARN Capacity Scheduler [17]. DIA execution times are estimated by relying on multiple models, including machine learning (ML) and simulation based on queueing networks (QNs), stochastic Petri nets (SPNs) [18], as well as an ad hoc discrete event simulator, dagSim [19], especially designed for the analysis of applications involving a number of stages linked by directed acyclic graphs (DAGs) of precedence constraints. This property is common to legacy MapReduce jobs, workloads based on Apache Tez, and Spark-based applications.…”
Section: Introductionmentioning
confidence: 99%
“…Although MapReduce job performance metrics can be evaluated, for example, by relying on simulations [14,27], there is a fundamental trade-off between the accuracy of the models and the time required to run them. Given the need to compute capacity allocation at scale (Hadoop clusters nowadays run thousands of jobs a day [48]), the high complexity of simulating even small-scale instances of MapReduce jobs has prevented us from exploiting such results here.…”
Section: Approximate Formulae For Mapreduce Execution Timementioning
confidence: 99%