PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks

Bateni, Soroush; Zhou, Husheng; Zhu, Yuankun; Liu, Cong

doi:10.1109/rtss.2018.00020

Cited by 30 publications

(7 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where, , are model parameters. Using this simple, yet effective, model we can determine the effect frequency scaling on inference time and throughput 3 . Also we estimate energy per request for a certain model models at a frequency , denoted as ( ) as:…”

Section: Model Recommendationmentioning

confidence: 99%

“…We assume that these devices may run applications such as mobile AR [5] or object recognition [15] that involve running deep learning inference using a deep neural network (DNN) model. The application may impose real-time latency constraints on DNN inference processing, which requires that such processing be performed on the device or at a nearby edge node (rather than in the cloud) [3,9]. We assume that the device (or edge node) has specialized hardware in the form of an embedded edge accelerator to accelerate DNN inference.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Design Considerations for Energy-efficient Inference on Edge Devices

Hanafy

Molom-Ochir

Shenoy

2021

Proceedings of the Twelfth ACM International Conference on Future Energy Systems

View full text Add to dashboard Cite

The emergence of low-power accelerators has enabled deep learning models to be executed on mobile or embedded edge devices without relying on cloud resources. The energy-constrained nature of these devices requires a judicious choice of a deep learning model and system configuration parameter to meet application needs while optimizing energy used during deep learning inference.In this paper, we carry out an experimental evaluation of more than 40 popular pretrained deep learning models to characterize trends in their accuracy, latency, and energy when running on edge accelerators. Our results show that as models have grown in size, the marginal increase in their accuracy has come at a much higher energy cost. Consequently, simply choosing the most accurate model for an application task comes at a higher energy cost; the application designer needs to consider the tradeoff between latency, accuracy, and energy use to make an appropriate choice. Since the relation between these metrics is non-linear, we present a recommendation algorithm to enable application designers to choose the best deep learning model for an application that meets energy budget constraints. Our results show that our technique can provide recommendations that are within 3 to 7% of the specified budget while maximizing accuracy and minimizing energy. CCS CONCEPTS• Computing methodologies → Neural networks; • Hardware → Power estimation and optimization.

show abstract

Section: Model Recommendationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Design Considerations for Energy-efficient Inference on Edge Devices

Hanafy

Molom-Ochir

Shenoy

2021

Proceedings of the Twelfth ACM International Conference on Future Energy Systems

View full text Add to dashboard Cite

show abstract

“…ApNet [12] applies approximation approaches to each layer of the DNN network and makes a trade-off between accuracy and latency. PredJoule [15] optimizes energy for running DNN workloads. It adjusts power configuration based on the latency of workloads.…”

Section: Related Workmentioning

confidence: 99%

“…Research in autonomous driving is an active field, with many papers published on important research points, scheduling layers of DNNs [12], [14], [15], scheduling memory allocation of DNNs [13], applying real-time scheduling [36], supported with micro-services [40], heterogeneity study [26], [39], and so on. This study builds on the many prior research efforts, but has a very different objective and level of focus.…”

Section: Introductionmentioning

confidence: 99%

Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card

Sung,

Xu,

Guan

et al. 2021

Preprint

View full text Add to dashboard Cite

Autonomous driving is of great interest in both research and industry. The high cost has been one of the major roadblocks that slow down the development and adoption of autonomous driving in practice. This paper, for the first-time, shows that it is possible to run level-4 (i.e., fully autonomous driving) software on a single off-the-shelf card (Jetson AGX Xavier) for less than $1k, an order of magnitude less than the state-of-the-art systems, while meeting all the requirements of latency. The success comes from the resolution of some important issues shared by existing practices through a series of measures and innovations. The study overturns the common perceptions of the computing resources required by level-4 autonomous driving, points out a promising path for the industry to lower the cost, and suggests a number of research opportunities for rethinking the architecture, software design, and optimizations of autonomous driving.The explorations together lead to the success of making all of the six level-4 autonomous driving applications achieve real-time performance on a single Jetson card. The success has multi-fold implications. It entails the need for the industry and the research community to reexamine some assumptions (on architecture, power budget, cost, the impact of interference,

show abstract

“…In the context of mobile devices, Lane et al 19 proposed two runtime algorithms to decompose a DNN model across available processors with the purpose of improving performance and energy‐efficiency. Very recently, a similar purpose has been pursued by Kang and Chung, 20 and Bateni et al 21 Hong et al 22 presented an extended synchronous dataflow model aimed at explicitly expressing the parallelism of loop structures, allowing to model the computational graph of a DNN during the training phase. Casini et al 23 proposed approaches for bounding the worst‐case response time of parallel tasks implemented with thread pools, using a task model inspired by Tensorflow.…”

Section: Related Workmentioning

confidence: 99%

Timing isolation and improved scheduling of deep neural networks for real‐time systems

2020

View full text Add to dashboard Cite

In recent years, the performance of deep neural networks (DNNs) is significantly improved, making them suitable for many application fields, such as autonomous driving, advanced robotics, and industrial control. Despite a lot of research being devoted to improving the accuracy of DNNs, only limited efforts have been spent to enhance their timing predictability, required in several real-time applications. This paper proposes a software infrastructure based on the Linux operating system to integrate DNNs within a real-time multicore system. It has been realized by modifying both the internal scheduler of the popular TensorFlow framework and the SCHED_DEADLINE scheduling class of Linux. The proposed infrastructure allows providing timing isolation of DNN inference tasks, hence improving the determinism of the temporal interference generated by TensorFlow. The proposal is finally evaluated with a case study derived from a state-of-the-art benchmark inspired by an autonomous industrial system. Extensive experiments demonstrate the effectiveness of the proposed solution and show a significant reduction of both average and longest-observed response times of TensorFlow tasks.

show abstract

PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks

Cited by 30 publications

References 31 publications

Design Considerations for Energy-efficient Inference on Edge Devices

Design Considerations for Energy-efficient Inference on Edge Devices

Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card

Timing isolation and improved scheduling of deep neural networks for real‐time systems

Contact Info

Product

Resources

About