Energy‐based tuning of convolutional neural networks on multi‐GPUs

Castro, Francisco M.; Guil, Nicolás; Marín-Jiménez, Manuel J.; Pérez-Serrano, Jesús; Ujaldón, Manuel

doi:10.1002/cpe.4786

Cited by 10 publications

(5 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GPUs are extremely effective for several basic DL primitives, which include greatly parallel-computing operations such as activation functions, matrix multiplication, and convolutions [326][327][328][329][330]. Incorporating HBM-stacked memory into the up-to-date GPU models significantly enhances the bandwidth.…”

Section: Gpu-based Approachmentioning

confidence: 99%

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

et al. 2021

View full text Add to dashboard Cite

In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.

show abstract

Section: Gpu-based Approachmentioning

confidence: 99%

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

et al. 2021

View full text Add to dashboard Cite

show abstract

“…For comparison, a recent study [11] illustrated that performance per Watt on the Pascal GPU is 42 GFLOPs/Watt and on the Maxwell 23 GFLOPs/Watt for a similar machine learning problem. Whilst these results are significantly higher than those achieved in the micro-core LINPACK benchmark, crucially these two HPC grade GPUs draw a maximum of 250 Watts, whereas the power draw of the micro-cores used in our experiments was 0.90 Watts for the Epiphany and 0.18 Watts for the MicroBlaze.…”

Section: Experimentation Resultsmentioning

confidence: 99%

“…This machine combines a host dual core ARM A9 CPU, with 1 GB of RAM and the 16 core Epiphany-III. Due to limitations in the Parallella, whilst the theoretical off-chip bandwidth of the Epiphany III is 600 MB/s, the maximum obtainable in practice is 150 MB/s [11]. For MicroBlaze experiments we use the Pynq-II SBC, mounting a Xilinx Zynx-7020 and 512 MB RAM.…”

Section: Background and Related Workmentioning

confidence: 99%

High level programming abstractions for leveraging hierarchical memories with micro-core architectures

Jamieson

Brown²

2020

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Micro-core architectures combine many low memory, low power computing cores together in a single package. These are attractive for use as accelerators but due to limited on-chip memory and multiple levels of memory hierarchy, the way in which programmers offload kernels needs to be carefully considered. In this paper we use Python as a vehicle for exploring the semantics and abstractions of higher level programming languages to support the offloading of computational kernels to these devices. By moving to a pass by reference model, along with leveraging memory kinds, we demonstrate the ability to easily and efficiently take advantage of multiple levels in the memory hierarchy, even ones that are not directly accessible to the micro-cores. Using a machine learning benchmark, we perform experiments on both Epiphany-III and MicroBlaze based micro-cores, demonstrating the ability to compute with data sets of arbitrarily large size. To provide context of our results, we explore the performance and power efficiency of these technologies, demonstrating that whilst these two micro-core technologies are competitive within their own embedded class of hardware, there is still a way to go to reach HPC class GPUs.

show abstract

“…Li et al 57 analyze the energy and power behavior of CPUs and GPUs in deep learning training, and identifies the power‐hungry layers in the neural network by quantifying the energy consumption of each CNN layer. Castro et al 58 use two CNN models (i.e., ResNet, AlexNet) to perform combined energy and performance analysis on multi‐GPU settings. However, these two works mainly focus on the energy consumption of deep learning training.…”

Section: Related Workmentioning

confidence: 99%

Evaluating and analyzing the energy efficiency of CNN inference on high‐performance GPU

Yao

Liu

Tang

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Convolutional neural network (CNN) inference usually runs on high-performance graphic processing units (GPUs). Since GPU is a high power consumption unit, that makes the energy consumption increases sharply due to the deep learning tasks. The energy efficiency of CNN inference is not only related to the software and hardware configurations, but also closely related to the application requirements of inference tasks. However, it is not clear on GPUs at present. In this paper, we conduct a comprehensive study on the model-level and layer-level energy efficiency of popular CNN models. The results point out several opportunities for further optimization. We also analyze the parameter settings (i.e., batch size, dynamic voltage and frequency scaling) and propose a revenue model to allow an optimal trade-off between energy efficiency and latency. Compared with the default settings, the optimal settings can improve revenue by up to 15.31×. We obtain the following main findings: (i) GPUs do not exploit the parallelism from the model depth and small convolution kernels, resulting in low energy efficiency. (ii) Convolutional layers are the most energy-consuming CNN layers. However, due to the cache, the power consumption of all layers is relatively balanced. (iii) The energy efficiency of TensorRT is 1.53× than that of TensorFlow.

show abstract

Energy‐based tuning of convolutional neural networks on multi‐GPUs

Cited by 10 publications

References 44 publications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

High level programming abstractions for leveraging hierarchical memories with micro-core architectures

Evaluating and analyzing the energy efficiency of CNN inference on high‐performance GPU

Contact Info

Product

Resources

About