2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 2017
DOI: 10.1109/pdp.2017.98
|View full text |Cite
|
Sign up to set email alerts
|

On the Evaluation of Energy-Efficient Deep Learning Using Stacked Autoencoders on Mobile GPUs

Abstract: Over the last years, deep learning architectures have gained attention by winning important international detection and classification challenges. However, due to high levels of energy consumption, the need to use low-power devices at acceptable throughput performance is higher than ever. This paper tries to solve this problem by introducing energy efficient deep learning based on local training and using low-power mobile GPU parallel architectures, all conveniently supported by the same high-level description… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…In this section, we look at how to tailor deep learning to mobile networking applications from three perspectives, namely, mobile devices and systems, distributed data centers, and changing mobile network environments. [513] Filter size shrinking, reducing input channels and late downsampling CNN Howard et al [514] Depth-wise separable convolution CNN Zhang et al [515] Point-wise group convolution and channel shuffle CNN Zhang et al [516] Tucker decomposition AE Cao et al [517] Data parallelization by RenderScript RNN Chen et al [518] Space exploration for data reusability and kernel redundancy removal CNN Rallapalli et al [519] Memory optimizations CNN Lane et al [520] Runtime layer compression and deep architecture decomposition MLP, CNN Huynh et al [521] Caching, Tucker decomposition and computation offloading CNN Wu et al [522] Parameters quantization CNN Bhattacharya and Lane [523] Sparsification of fully-connected layers and separation of convolutional kernels MLP, CNN Georgiev et al [97] Representation sharing MLP Cho and Brand [524] Convolution operation optimization CNN Guo and Potkonjak [525] Filters and classes pruning CNN Li et al [526] Cloud assistance and incremental learning CNN Zen et al [527] Weight quantization LSTM Falcao et al [528] Parallelization and memory sharing Stacked AE Fang et al [529] Model pruning and recovery scheme CNN Xu et al [530] Reusable region lookup and reusable region propagation scheme CNN…”
Section: Tailoring Deep Learning To Mobile Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we look at how to tailor deep learning to mobile networking applications from three perspectives, namely, mobile devices and systems, distributed data centers, and changing mobile network environments. [513] Filter size shrinking, reducing input channels and late downsampling CNN Howard et al [514] Depth-wise separable convolution CNN Zhang et al [515] Point-wise group convolution and channel shuffle CNN Zhang et al [516] Tucker decomposition AE Cao et al [517] Data parallelization by RenderScript RNN Chen et al [518] Space exploration for data reusability and kernel redundancy removal CNN Rallapalli et al [519] Memory optimizations CNN Lane et al [520] Runtime layer compression and deep architecture decomposition MLP, CNN Huynh et al [521] Caching, Tucker decomposition and computation offloading CNN Wu et al [522] Parameters quantization CNN Bhattacharya and Lane [523] Sparsification of fully-connected layers and separation of convolutional kernels MLP, CNN Georgiev et al [97] Representation sharing MLP Cho and Brand [524] Convolution operation optimization CNN Guo and Potkonjak [525] Filters and classes pruning CNN Li et al [526] Cloud assistance and incremental learning CNN Zen et al [527] Weight quantization LSTM Falcao et al [528] Parallelization and memory sharing Stacked AE Fang et al [529] Model pruning and recovery scheme CNN Xu et al [530] Reusable region lookup and reusable region propagation scheme CNN…”
Section: Tailoring Deep Learning To Mobile Networkmentioning
confidence: 99%
“…Beyond these works, researchers also successfully adapt deep learning architectures through other designs and sophisticated optimizations, such as parameters quantization [522], [527], sparsification and separation [523], representation and memory sharing [97], [528], convolution operation optimization [524], pruning [525], cloud assistance [526] and compiler optimization [532]. These techniques will be of great significance when embedding deep neural networks into mobile systems.…”
Section: A Tailoring Deep Learning To Mobile Devices and Systemsmentioning
confidence: 99%
“…A large body of early studies focused on reducing power consumption of a single server by applying the dynamic voltage and frequency scaling technique (i.e., DVFS) [16], low-power chipsets [17], and advanced cooling techniques [18]. Emerging energy-management schemes aim to optimize energy efficiency of servers equipped with multi-core processors [19], GPUs [20], and smart memory cubes [21]. In contrast to the above energy-efficient computing strategies, our REDUX pays attention to reducing energy cost of large-scale data centers.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Another aspect to consider consists of using devices that require less power but are slower than the ones used in these experiments. Specifically, mobile GPUs are about 10 times slower [6] than the desktop GPUs used. For this simulation, the same variables as before are manipulated: Internet speed and number of nodes.…”
Section: Low-power and Mobile Gpusmentioning
confidence: 99%
“…Although mobile GPUs have only a tenth of the processing power as their desktop counterpart and achieve considerably worse throughput performance, they should still be considered as a viable alternative, in particular because of power consumption requirements, with mobile GPUs achieving the same classification performance as the best GPUs in the market, but taking ten times longer [6] to complete execution. However, since their average power is nearly three orders of magnitude lower than the reference GPU, it results in energy consumption around two orders of magnitude lower for the same amount of computation.…”
Section: Low-power and Mobile Gpusmentioning
confidence: 99%