DjiNN and Tonic

Hauswald, Johann; Kang, Yiping; Laurenzano, Michael A.; Chen, Quan; Li, Cheng; Mudge, Trevor; Dreslinski, Ronald G.; Mars, Jason; Tang, Lingjia

doi:10.1145/2749469.2749472

Cited by 124 publications

(3 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is also a large body of work accelerating machine learning-based applications using various accelerator platforms [1, 7ś9, 12ś14, 19, 23ś25, 30, 31, 36, 38, 41, 42, 58, 63, 80]. Speciically, GPUs have been shown to ofer orders of magnitude performance improvement over multicore CPUs [24,25,27,58]. This is because many machine learning algorithms spend a large fraction of their execution time performing matrix multiplication, which can be parallelized on the large number of threads ofered by GPUs.…”

Section: Related Workmentioning

confidence: 99%

The Architectural Implications of Autonomous Driving

et al. 2018

Self Cite

View full text Add to dashboard Cite

Autonomous driving systems have attracted a signiicant amount of interest recently, and many industry leaders, such as Google, Uber, Tesla and Mobileye, have invested large amount of capital and engineering power on developing such systems. Building autonomous driving systems is particularly challenging due to stringent performance requirements in terms of both making the safe operational decisions and inishing processing at real-time. Despite the recent advancements in technology, such systems are still largely under experimentation and architecting end-to-end autonomous driving systems remains an open research question. To investigate this question, we irst present and formalize the design constraints for building an autonomous driving system in terms of performance, predictability, storage, thermal and power. We then build an end-to-end autonomous driving system using state-of-the-art award-winning algorithms to understand the design trade-ofs for building such systems. In our real-system characterization, we identify three computational bottlenecks, which conventional multicore CPUs are incapable of processing under the identiied design constraints. To meet these constraints, we accelerate these algorithms using three accelerator platforms including GPUs, FPGAs and ASICs, which can reduce the tail latency of the system by 169×, 10×, and 93× respectively. With accelerator-based designs, we are able to build an endto-end autonomous driving system that meets all the design constraints, and explore the trade-ofs among performance, power and the higher accuracy enabled by higher resolution cameras. CCS Concepts • Computer systems organization → Neural networks; Heterogeneous (hybrid) systems;

show abstract

Section: Related Workmentioning

confidence: 99%

The Architectural Implications of Autonomous Driving

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…To improve a WSC's performance and cost-effectiveness, a large volume of studies discuss resource-and performanceaware job scheduling [18-21, 36-38, 44, 49, 52-54, 56], adaptive power management [29,31,39,40,42], novel architectures [9,10,27], and systematic performance investigation methods [2,17,22,25,30,33,47,48,50]. Unfortunately, to the best of our knowledge, there is no standard methodology to evaluate the holistic performance of a WSC running thousands of distinct jobs.…”

Section: Need For a Systematic Performance Evaluation Methodologymentioning

confidence: 99%

WSMeter

et al. 2018

View full text Add to dashboard Cite

show abstract

“…In order to achieve high‐efficiency inference in these immersive services on mobile device, we can offload some computation tasks to effectively leverage computing resources in edge servers and cloud servers. Coincidentally, we find that the deep neural network (DNN) is the most common of ML techniques 12 and the DNN model can be split into different portions on layer‐level partition 13 . Therein, partial offloading can sometimes outperform binary offloading because an internal layer inside the DNN model usually yields a smaller intermediate output than input layer.…”

Section: Introductionmentioning

confidence: 99%

An intelligent collaborative inference approach of service partitioning and task offloading for deep learning based service in mobile edge computing networks

Qin

Zhou

et al. 2021

Trans Emerging Tel Tech

View full text Add to dashboard Cite

As the rapid evolution of smart devices and real‐time applications, many new kinds of computation‐intensive services have been emerged successively and the corresponding requirements have been growing dramatically. Extended from cloud computing, mobile edge computing (MEC) is a novel technology which can provide powerful computing resource at the proximity of resource‐restrained mobile devices. Thus, it enables collaboration between edge server and mobile device, which can improve the quality of experience for users. In this article, we propose an intelligent collaborative inference (ICI) approach for real‐time computation‐intensive services in MEC network, which can achieve intelligent service partitioning and partial task offloading. Since machine learning algorithms have been applied in many applications with the advancement of big data and computing power, we focus on the services based on deep‐learning. Particularly, we research a service based on Pose‐Net model to achieve human pose estimation in the field of computer vision. And we design relevant ICI algorithm to achieve fine‐grained video stream processing in consideration of video service requirement, deep neural network (DNN) model structure, mobile device capability, wireless network condition, and cooperative server workload. Based on Python programming language and TensorFlow library, we test the ICI approach with some practical simulation parameters on real hardware platforms. The experiment results show that the presented ICI approach have superior performance in terms of service frame rate and client energy consumption than other benchmark approaches.

show abstract

DjiNN and Tonic

Cited by 124 publications

References 32 publications

The Architectural Implications of Autonomous Driving

The Architectural Implications of Autonomous Driving

WSMeter

An intelligent collaborative inference approach of service partitioning and task offloading for deep learning based service in mobile edge computing networks

Contact Info

Product

Resources

About