A Graph Signal Processing Based Strategy for Deep Neural Network Inference

Wang, Lin; Qian, Hui

doi:10.1109/icise51755.2020.00107

Cited by 5 publications

(9 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several studies exist for operator execution order scheduling, such as [2,38,39]. Among these efforts [38,39] focus on minimizing peak memory consumption by reordering operators for resource-constrained devices (e.g., MCUs), and effort [2] proposes an optimized scheduling framework for complex models (irregularly wired neural networks). These approaches rely on static shapes only.…”

Section: Related Workmentioning

confidence: 99%

SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution

Niu,

Agrawal,

Ren

2024

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,

View full text Add to dashboard Cite

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD 2 , a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD 2 runs up to 3.9× faster than these systems while saving up to 88% peak memory consumption. CCS CONCEPTS• Computing methodologies → Neural networks; • Software and its engineering → Source code generation; • Human-centered computing → Mobile computing.

show abstract

Section: Related Workmentioning

confidence: 99%

SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution

Niu,

Agrawal,

Ren

2024

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,

View full text Add to dashboard Cite

show abstract

“…Several techniques have been developed to address TinyML's low-resource challenges, including pruning [39], [40], [41], [42], [43], [44], [45], Quantization [46], [47], [48], [49], [50], [39], [51], [52], [53], [54], [55] and neural architecture search (NAS) [53], [56], [57], [58], [59], [60], [61], [62], [63], [64]. These methods reduce model parameters while maintaining model accuracy, allowing the models to be applied to MCUs.…”

Section: Aifesmentioning

confidence: 99%

AIfES: A Next-Generation Edge AI Framework

Wulfert,

Kühnel,

Krupp

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Edge Artificial Intelligence (AI) relies on the integration of Machine Learning (ML) into even the smallest embedded devices, thus enabling local intelligence in real-world applications, e.g. for image or speech processing. Traditional Edge AI frameworks lack important aspects required to keep up with recent and upcoming ML innovations. These aspects include low flexibility concerning the target hardware and limited support for custom hardware accelerator integration. Artificial Intelligence for Embedded Systems Framework (AIfES) has the goal to overcome these challenges faced by traditional edge AI frameworks. In this paper, we give a detailed overview of the architecture of AIfES and the applied design principles. Finally, we compare AIfES with TensorFlow Lite for Microcontrollers (TFLM) on an ARM Cortex-M4-based System-on-Chip (SoC) using fully connected neural networks (FCNNs) and convolutional neural networks (CNNs). AIfES outperforms TFLM in both execution time and memory consumption for the FCNNs. Additionally, using AIfES reduces memory consumption by up to 54 % when using CNNs. Furthermore, we show the performance of AIfES during the training of FCNN as well as CNN and demonstrate the feasibility of training a CNN on a resource-constrained device with a memory usage of slightly more than 100 kB of RAM.

show abstract

“…By using a similar approach, DNNs were split between edge (Jetson TX2 and NVIDIA Drive PX2 devices) and cloud domains [22]. Finally, memory efficient patch-based inference for microcontrollers (with only hundreds KBs of RAM) has been proposed [23], which reduces peak memory consumption of existing models by 4-8x. Google has published several convolutional neural networks (CNNs) in Tensorflow Hub for human joint keypoint identification [13].…”

Section: Literature Reviewmentioning

confidence: 99%

Evaluation of Human Pose Recognition and Object Detection Technologies and Architecture for Situation-Aware Robotics Applications in Edge Computing Environment

Pääkkönen,

Pakkala

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

A Graph Signal Processing Based Strategy for Deep Neural Network Inference

Cited by 5 publications

References 7 publications

SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution

SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution

AIfES: A Next-Generation Edge AI Framework

Evaluation of Human Pose Recognition and Object Detection Technologies and Architecture for Situation-Aware Robotics Applications in Edge Computing Environment

Contact Info

Product

Resources

About

A Graph Signal Processing Based Strategy for Deep Neural Network Inference

Cited by 5 publications

References 7 publications

SoD 2 : Statically Optimizing Dynamic Deep Neural Network Execution

SoD 2 : Statically Optimizing Dynamic Deep Neural Network Execution

AIfES: A Next-Generation Edge AI Framework

Evaluation of Human Pose Recognition and Object Detection Technologies and Architecture for Situation-Aware Robotics Applications in Edge Computing Environment

Contact Info

Product

Resources

About

SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution

SoD ² : Statically Optimizing Dynamic Deep Neural Network Execution