On the Evaluation of Energy-Efficient Deep Learning Using Stacked Autoencoders on Mobile GPUs

Falcão, Gabriel; Alexandre, Luı́s A.; Marques, José P.; Frazão, Xavier; Maria, Joao

doi:10.1109/pdp.2017.98

Cited by 10 publications

(7 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we look at how to tailor deep learning to mobile networking applications from three perspectives, namely, mobile devices and systems, distributed data centers, and changing mobile network environments. [513] Filter size shrinking, reducing input channels and late downsampling CNN Howard et al [514] Depth-wise separable convolution CNN Zhang et al [515] Point-wise group convolution and channel shuffle CNN Zhang et al [516] Tucker decomposition AE Cao et al [517] Data parallelization by RenderScript RNN Chen et al [518] Space exploration for data reusability and kernel redundancy removal CNN Rallapalli et al [519] Memory optimizations CNN Lane et al [520] Runtime layer compression and deep architecture decomposition MLP, CNN Huynh et al [521] Caching, Tucker decomposition and computation offloading CNN Wu et al [522] Parameters quantization CNN Bhattacharya and Lane [523] Sparsification of fully-connected layers and separation of convolutional kernels MLP, CNN Georgiev et al [97] Representation sharing MLP Cho and Brand [524] Convolution operation optimization CNN Guo and Potkonjak [525] Filters and classes pruning CNN Li et al [526] Cloud assistance and incremental learning CNN Zen et al [527] Weight quantization LSTM Falcao et al [528] Parallelization and memory sharing Stacked AE Fang et al [529] Model pruning and recovery scheme CNN Xu et al [530] Reusable region lookup and reusable region propagation scheme CNN…”

Section: Tailoring Deep Learning To Mobile Networkmentioning

confidence: 99%

“…Beyond these works, researchers also successfully adapt deep learning architectures through other designs and sophisticated optimizations, such as parameters quantization [522], [527], sparsification and separation [523], representation and memory sharing [97], [528], convolution operation optimization [524], pruning [525], cloud assistance [526] and compiler optimization [532]. These techniques will be of great significance when embedding deep neural networks into mobile systems.…”

Section: A Tailoring Deep Learning To Mobile Devices and Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning in Mobile and Wireless Networking: A Survey

Zhang

Patras

Haddadi

2019

IEEE Commun. Surv. Tutorials

1,383

838

View full text Add to dashboard Cite

The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, real-time extraction of fine-grained analytics, and agile management of network resources, so as to maximize user experience. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques, in order to help manage the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space.In this paper we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-theart in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research.

show abstract

Section: Tailoring Deep Learning To Mobile Networkmentioning

confidence: 99%

Section: A Tailoring Deep Learning To Mobile Devices and Systemsmentioning

confidence: 99%

Deep Learning in Mobile and Wireless Networking: A Survey

Zhang

Patras

Haddadi

2019

IEEE Commun. Surv. Tutorials

1,383

838

View full text Add to dashboard Cite

show abstract

“…A large body of early studies focused on reducing power consumption of a single server by applying the dynamic voltage and frequency scaling technique (i.e., DVFS) [16], low-power chipsets [17], and advanced cooling techniques [18]. Emerging energy-management schemes aim to optimize energy efficiency of servers equipped with multi-core processors [19], GPUs [20], and smart memory cubes [21]. In contrast to the above energy-efficient computing strategies, our REDUX pays attention to reducing energy cost of large-scale data centers.…”

Section: Background and Related Workmentioning

confidence: 99%

Exploiting Renewable Energy and UPS Systems to Reduce Power Consumption in Data Centers

et al. 2022

View full text Add to dashboard Cite

To develop environmental friendly and energy-efficient data centers, it is prudent to leverage on-site renewable sources like solar and wind. Data centers deploy distributed UPS systems to improve efficiency, scalability, and reliability of UPS systems, thereby handling the intermittent nature of renewable energy. We propose a renewableenergy manager called REDUX to (1) offer a smart way of managing energy supply of data centers powered by grid and renewable energy and (2) maintain a desirable balance between energy cost and system performance. To achieve this overarching objective, REDUX judiciously orchestrates distribute UPS devices (i.e., recharge or discharge) to allocate energy resources when (1) grid price is at low or high states or (2) renewable energy generation is at a low or fluctuate level. REDUX not only guarantees the stable operation of daily workload conditions, but also cuts back the energy cost of data centers by improving power resource utilization. Compared with the existing strategies, REDUX demonstrates a prominent capability of mitigating average peak workload and boosting renewable-energy utilization.Index Terms-Renewable energy, uninterruptible power supply (UPS), distributed UPS systems, resource management, energy cost, data centers.• Energy cost of large-scale data centers is skyrocketing.• Consuming renewable energy in data centers brings economical and environmental benefits.

show abstract

“…Another aspect to consider consists of using devices that require less power but are slower than the ones used in these experiments. Specifically, mobile GPUs are about 10 times slower [6] than the desktop GPUs used. For this simulation, the same variables as before are manipulated: Internet speed and number of nodes.…”

Section: Low-power and Mobile Gpusmentioning

confidence: 99%

“…Although mobile GPUs have only a tenth of the processing power as their desktop counterpart and achieve considerably worse throughput performance, they should still be considered as a viable alternative, in particular because of power consumption requirements, with mobile GPUs achieving the same classification performance as the best GPUs in the market, but taking ten times longer [6] to complete execution. However, since their average power is nearly three orders of magnitude lower than the reference GPU, it results in energy consumption around two orders of magnitude lower for the same amount of computation.…”

Section: Low-power and Mobile Gpusmentioning

confidence: 99%

Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

Marques

Falcão

Alexandre

2018

Applied Artificial Intelligence

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times-the computational complex part-that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training, such as Caffe, Torch or TensorFlow. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs that can be considered as a particular instantiation of model parallelism, where only the convolutional layer is distributed. In fact, the convolutions processed during training (forward and backward propagation included) represent from 60-90% of global processing time. The paper analyzes the influence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs. For the CIFAR-10 dataset, using 1

show abstract

On the Evaluation of Energy-Efficient Deep Learning Using Stacked Autoencoders on Mobile GPUs

Cited by 10 publications

References 7 publications

Deep Learning in Mobile and Wireless Networking: A Survey

Deep Learning in Mobile and Wireless Networking: A Survey

Exploiting Renewable Energy and UPS Systems to Reduce Power Consumption in Data Centers

Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures

Contact Info

Product

Resources

About