2019
DOI: 10.1016/j.is.2019.01.006
|View full text |Cite
|
Sign up to set email alerts
|

Predictive performance modeling for distributed batch processing using black box monitoring and machine learning

Abstract: In many domains, the previous decade was characterized by increasing data volumes and growing complexity of computational workloads, creating new demands for highly data-parallel computing in distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e. g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(23 citation statements)
references
References 119 publications
0
21
0
Order By: Relevance
“…Handling the performance variability in multi-tenant distributed systems is essentials to the multiple work ows scheduling problems as the scheduling highly relies on the accurate estimation of work ow's performance Multiple Workflows Scheduling in Multi-tenant Distributed Systems: A Taxonomy and Future Directions • 1:9 on a particular computational infrastructure. A empts to increase the quality of scheduling by accurately estimating the time needed for completing a task, as one of the strategies for taking care of the uncertainty, has been extensively studied (Wi et al 2019). Speci c work designed for scienti c work ow includes a work by Nadeem and Fahringer (Nadeem and Fahringer 2009) that utilized the template to predict the scienti c work ow applications execution time.…”
Section: Deployment Model Taxonomymentioning
confidence: 99%
“…Handling the performance variability in multi-tenant distributed systems is essentials to the multiple work ows scheduling problems as the scheduling highly relies on the accurate estimation of work ow's performance Multiple Workflows Scheduling in Multi-tenant Distributed Systems: A Taxonomy and Future Directions • 1:9 on a particular computational infrastructure. A empts to increase the quality of scheduling by accurately estimating the time needed for completing a task, as one of the strategies for taking care of the uncertainty, has been extensively studied (Wi et al 2019). Speci c work designed for scienti c work ow includes a work by Nadeem and Fahringer (Nadeem and Fahringer 2009) that utilized the template to predict the scienti c work ow applications execution time.…”
Section: Deployment Model Taxonomymentioning
confidence: 99%
“…Their focus is on resource provisioning in the cloud. Witt et al [5] survey Predictive Performance Modeling (PPM) approaches in the area of distributed computing and give an overview of the state-of-the-art in that field. Koziolek [6] gives an overview of performance prediction and measurement approaches with a focus on component-based software systems.…”
Section: Related Workmentioning
confidence: 99%
“…Witt et al [5] survey PPM approaches in the area of distributed computing and give an overview of the state-of-the-art in that field. Note that the referenced survey focuses on general performance prediction, not solely on failure prediction in the context of batch-processing.…”
Section: Online Predictionmentioning
confidence: 99%
“…The performance modeling systems require knowledge of the source code and an analytical model of the slowest parts of the code (Xu et al, 1996). Many systems exist to model the performance of distributed jobs (Barnes et al, 2008;Xu et al, 1996;Kuperberg et al, 2008;Witt et al, 2018), with some employing Black Box testing (Yang et al, 2005;Kavulya et al, 2010) or tests on scientific benchmark cases (Carrington et al, 2006). Such performance analysis does not require intimate knowledge of the software and can be applied on data obtained from processing on a grid infrastructure.…”
Section: Related Workmentioning
confidence: 99%