Comprehensive regression-based model to predict performance of general-purpose graphics processing unit

Deep Learning (DL) is moving towards deploying workloads not only in cloud datacenters, but also to the local devices. Although these are mostly limited to inference tasks, it still widens the range of possible target architectures significantly. Additionally, these new targets usually come with drastically reduced computation performance and memory sizes compared to the traditionally used architectures—and put the key optimization focus on the efficiency as they often depend on batteries. To help developers quickly estimate the performance of a neural network during its design phase, performance models could be used. However, these models are expensive to implement as they require in-depth knowledge about the hardware architecture and the used algorithms. Although AI-based solutions exist, these either require large datasets that are difficult to collect on the low-performance targets and/or limited to a small number of target platforms and metrics. Our solution exploits the block-based structure of neural networks, as well as the high similarity in the typically used layer configurations across neural networks, enabling the training of accurate models on significantly smaller datasets. In addition, our solution is not limited to a specific architecture or metric. We showcase the feasibility of the solution on a set of seven devices from four different hardware architectures, and with up to three performance metrics per target—including the power consumption and memory footprint. Our tests have shown that the solution achieved an error of less than 1 ms (2.6%) in latency, 0.12 J (4%) in energy consumption and 11 MiB (1.5%) in memory allocation for the whole network inference prediction, while being up to five orders of magnitude faster than a benchmark.

show abstract

“…Shafiabadi et al [23,24] uses regression models to estimate the performance of OpenCL [17] programs on a specific AMD GPU architecture.…”

Section: Performance Prediction For General Purpose Applicationsmentioning

confidence: 99%

AI-Driven Performance Modeling for AI Inference Workloads

2022

View full text Add to dashboard Cite

show abstract

Performance prediction of deep learning applications training in GPU as a service systems

et al. 2022

View full text Add to dashboard Cite

growth rate of over 38% to support 3D models, animated video processing, and gaming. GPUaaS adoption will be also boosted by the use of graphics processing units (GPUs) to support Deep learning (DL) model training. Indeed, nowadays, the main cloud providers already offer in their catalogs GPU-based virtual machines pre-installed with the popular DL framework (like Torch, PyTorch, TensorFlow, and Caffe) simplifying DL model programming operations.Motivated by these considerations, this paper studies GPU-deployed neural networks (NNs) and tackles the issue of performance prediction, particularly with respect to NN training times. The proposed approach is based on machine learning and exploits two main sets of features which describe, on one hand, the network architecture and the hyper-parameters, on the other, the hardware characteristics of the target deployment. Such data enable the learning of multiple linear regression models, which, coupled with an established feature selection technique, become accurate prediction tools, with errors below 11 % on average. An extensive experimental campaign, performed both on public and in-house private cloud deployments, considers popular deep NNs used for image classification and speech transcription and shows that prediction errors remain small even when extrapolating outside the range spanned by the input data. This has important implications for the models' applicability: in this way, it is possible to investigate the impact on the performance of different GPUaaS deployment or hardware upgrades even without conducting an empirical investigation on the specific target device or to evaluate the changes in training time when the number of inner modules in the deep neural networks varies.

show abstract

Toward a general framework for jointly processor-workload empirical modeling

Sheidaeian

Fatemi

2020

J Supercomput

View full text Add to dashboard Cite

Comprehensive regression-based model to predict performance of general-purpose graphics processing unit

Cited by 5 publications

References 26 publications

AI-Driven Performance Modeling for AI Inference Workloads

AI-Driven Performance Modeling for AI Inference Workloads

Performance prediction of deep learning applications training in GPU as a service systems

Toward a general framework for jointly processor-workload empirical modeling

Contact Info

Product

Resources

About