DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices

Lane, Nicholas D.; Bhattacharya, Sourav; Georgiev, Petko; Forlivesi, Claudio; Jiao, Lei; Qendro, Lorena; Kawsar, Fahim

doi:10.1109/ipsn.2016.7460664

Cited by 382 publications

(216 citation statements)

References 26 publications

Supporting

Mentioning

216

Contrasting

Order By: Relevance

“…First, these research works target different computing devices, such as DSP, 5,32 GPU, 24 and LPU. 24 Several of these attempts take advantage of linear algebra-based optimization, such as singular value decomposition 33 and Tucker decomposition, 34 to reduce the complexity of convolution computations. 24 Several of these attempts take advantage of linear algebra-based optimization, such as singular value decomposition 33 and Tucker decomposition, 34 to reduce the complexity of convolution computations.…”

Section: Related Workmentioning

confidence: 99%

“…First, these research works target different computing devices, such as DSP, 5,32 GPU, 24 and LPU. 33 Second, these efforts aim to leverage several programming models on mobile devices, such as OpenCL, 34 Vulkan, 34 CUDA 33 (only available on compatible devices), and RenderScript. 24 Several of these attempts take advantage of linear algebra-based optimization, such as singular value decomposition 33 and Tucker decomposition, 34 to reduce the complexity of convolution computations.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A collaborative CPU‐GPU approach for deep learning on mobile devices

Valery

Liu

2019

Concurrency and Computation

View full text Add to dashboard Cite

As mobile devices become more prevalent, users tend to reassess their expectations regarding the personalization of mobile services. The data collected by a mobile device's sensors provide an opportunity to gain insight into the user's profile. Recently, deep learning has gained momentum and has become the method of choice for solving machine learning problems. Interestingly, training a deep neural network on a mobile device is often mistakenly regarded as cumbersome.For instance, several deep learning frameworks only provide a CPU-based implementation for prediction tasks on a mobile device. In contrast to servers, a mobile computing environment imposes many domain-specific constraints that invite us to review the general computing approach used in a deep learning framework implementation. In this paper, we propose a deep learning framework that has been specifically designed for mobile device platforms. Our approach relies on the collaboration of the multicore CPU and the integrated GPU to accelerate deep learning computation on mobile devices. Our work exploits the shared memory architecture of mobile devices to promote CPU-GPU collaboration without any data copying. We analyze our approach with regard to three factors: performance/portability trade-off, power efficiency, and memory management. KEYWORDS deep learning, energy efficient, GPGPU, heterogeneous system, mobile computing, OpenCL INTRODUCTIONAs mobile devices get more sophisticated, the need for more optimization and personalization of mobile services has become a primary expectation of users. The latest mobile devices provide the opportunity to collect information about the user throughout the day via multiple mobile sensors, wearable sensors, and the Internet of things (IoT). In contrast, desktop and laptop computers are limited in the type of data that can be collected. For instance, even if laptops are suitable for mobile use, most do not have GPS sensors and are therefore not location aware. The aggregation of the collected data provides an opportunity to gain insight into mobile users' profiles and then personalize the mobile experience of a particular user.In recent years, deep learning has become the method of choice for solving complex representation learning problems. 1 The computational model comprises a set of processing layers, each of which successively transforms its input into a slightly more abstract representation than that in the previous layer. One fundamental aspect of deep learning is that each layer relies on a simple module that does not require any particular domain expertise for its design. This stack of processing layers attempts to produce a new set of input data that reduces irrelevant variations and amplifies discriminative information. Using a layer composition, deep learning methods can learn complex nonlinear functions by exploiting a general-purpose learning procedure. Deep learning is now regarded as the state-of-the-art method for solving many complex problems, such as image processing.The authors claim that deep le...

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A collaborative CPU‐GPU approach for deep learning on mobile devices

Valery

Liu

2019

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…For example, DeepX [18] accelerates the deep learning inference on mobile devices by using the DSP, GPU and using runtime layer compression to decompose the deep model across available hardware resources. However, in their paper results, DeepX [18] used the GPU only on the Nvidia Tegra K1 Soc and relied on using DSP on the more popular Snapdragon Qualcomm SoC. Also, DeepX is not available for the public developers to use and does not integrate within popular deep learning frameworks.…”

Section: Relatedworkmentioning

confidence: 99%

“…Therefore, our can be used to accelerate other models than convolution neural networks. Finally, our system can be used to run models trained with TensorFlow out of the box without any model conversion or preparation as needed by [20], and [18]. …”

Section: Relatedworkmentioning

confidence: 99%

RSTensorFlow

Alzantot

Wang

Ren

et al. 2017

Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications

View full text Add to dashboard Cite

Mobile devices have become an essential part of our daily lives. By virtue of both their increasing computing power and the recent progress made in AI, mobile devices evolved to act as intelligent assistants in many tasks rather than a mere way of making phone calls. However, popular and commonly used tools and frameworks for machine intelligence are still lacking the ability to make proper use of the available heterogeneous computing resources on mobile devices. In this paper, we study the benefits of utilizing the heterogeneous (CPU and GPU) computing resources available on commodity android devices while running deep learning models. We leveraged the heterogeneous computing framework to accelerate the execution of deep learning models on commodity Android devices. Our system is implemented as an extension to the popular open-source framework . By integrating our acceleration framework tightly into , machine learning engineers can now easily make benefit of the heterogeneous computing resources on mobile devices without the need of any extra tools. We evaluate our system on different android phones models to study the trade-offs of running different neural network operations on the GPU. We also compare the performance of running different models architectures such as convolutional and recurrent neural networks on CPU only vs using heterogeneous computing resources. Our result shows that although GPUs on the phones are capable of offering substantial performance gain in matrix multiplication on mobile devices. Therefore, models that involve multiplication of large matrices can run much faster (approx. 3 times faster in our experiments) due to GPU support.

show abstract

“…We are also aiming to enable powerful CNN compression techniques [13,16] in the CNN optimization workflow and expose all their optimization parameters. Indeed, while optimizing the building blocks of CNNs is important, it is even more important to ensure that no unnecessary computation and data movement takes place.…”

Section: Open Call For Collaborative Op-timization Of Cnnsmentioning

confidence: 99%