aMLLibrary: An AutoML Approach For Performance Prediction

Bruno, Guindani,; Lattuada, Marco; Ardagna, Danilo

doi:10.7148/2023-0215

Cited by 1 publication

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reported numbers of CPUs and GPUs in Table I are intended per-worker. For each experiment, we collected the execution times of different stages, as shown in Figure 2: 2) Performance Models Building: Machine Learning (ML)based performance models were built through the aMLLibrary, an open-source tool for the automatic generation of regression models proposed in [29]. Specifically, the set of considered ML methods and corresponding hyperparameters is reported in Table II.…”

Section: Methodsmentioning

confidence: 99%

Performance Models for Distributed Deep Learning Training Jobs on Ray

Filippini,

Lublinsky,

de Bayser

et al. 2023

2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

Self Cite

View full text Add to dashboard Cite

Deep Learning applications are pervasive today, and efficient strategies are designed to reduce the computational time and resource demand of the training process. The Distributed Deep Learning (DDL) paradigm yields a significant speed-up by partitioning the training into multiple, parallel tasks. The Ray framework supports DDL applications exploiting data parallelism by enhancing the scalability with minimal user effort. This work aims at evaluating the performance of DDL training applications, by profiling their execution on a Ray cluster and developing Machine Learning-based models to predict the training time when changing the dataset size, the number of parallel workers and the amount of computational resources. Such performance-prediction models are crucial to forecast computational resources usage and costs in Cloud environments. Experimental results prove that our models achieve average prediction errors between 3 and 15% for both interpolation and extrapolation, thus demonstrating their applicability to unforeseen scenarios.

show abstract

Section: Methodsmentioning

confidence: 99%