Efficient Computer Vision on Edge Devices with Pipeline-Parallel Hierarchical Neural Networks

Goel, Abhinav; Tung, Caleb; Hu, Xiao; Thiruvathukal, George K.; Davis, James N.; Lu, Yung-Hsiang

doi:10.1109/asp-dac52403.2022.9712574

Cited by 15 publications

(9 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Backed, in terms of the communication, by the pipelined architecture introduced in the previous section, pipeline parallelism [ 77 , 79 , 82 , 85 , 88 , 90 , 91 ] constitutes the simplest way to distribute the inference workload. It is a parallelism modality inherent to the traditional chain-like architecture of DNNs, which typically consists of a sequence of layers in which each layer’s output is dependent on the output provided by its previous layers.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

“…More specifically, the objective of horizontally distributing a DNN to parallelize the computations required for inference is to improve the performance of the entire system, mainly in terms of time-specific metrics. Essentially it seeks to minimize latency [ 70 , 71 , 75 , 78 , 79 , 80 , 81 , 82 , 87 , 88 , 89 , 90 ] or maximize throughput, i.e., the number of performed inferences per second, when dealing with data streams [ 72 , 76 , 77 , 85 ], and, to a lesser extent and in terms of energy efficiency, to minimize power consumption in a few studies [ 79 , 85 , 86 ]. In particular, as regards the objective of minimizing latency, in nearly all the studies analyzed, this refers to the time cost incurred for the end-to-end execution of the exploited DNN model, being computed as the sum of the computation cost derived from the execution of the DNN partitions in the different devices involved and the transmission cost reflecting the time necessary to communicate intermediate results between nodes.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

“…Workload balancing (i) [ 70 , 71 , 72 , 76 , 77 , 78 , 80 , 87 , 88 ] in particular has emerged as the most prevalent problem model in the corpus under study. The relationship between performance and the balance of computation among devices is so strong that a hierarchy partition is deemed balanced if the ratio of processing times on the most and least loaded devices is minimized.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

“…The model abstracts the problem, providing a framework for either developing a brand-new algorithm from scratch or, more commonly, reusing and fine-tuning a well-known algorithm to rationalize the search for potential solutions. All the research efforts identified can be situated within this last line of work, essentially exploring and exploiting general problem-solving techniques, namely, (i) greedy algorithms [ 71 , 78 , 80 , 81 , 82 , 85 , 89 ], which recursively build up the pursued global optimum by picking the best partial solution at each iteration, for instance, partitions with the least computational complexity [ 71 ] or with the smallest total prediction time consumption [ 78 ], and also devices that accomplish the best latency with the maximum residual computation [ 82 ] or produce the slightest increase of the maximum task completion time [ 81 ]; (ii) exhaustive search algorithms [ 72 , 75 , 88 ], which sequentially evaluate each potential solution in order to eventually obtain the global final solution; (iii) heuristic algorithms [ 76 , 79 ], which lay down rules to allow a more efficient exploration of the search space, e.g., grouping partitions with the same latency [ 76 ] and pruning the DNN’s computationally light nodes [ 79 ], yielding a faster solution at the cost of sacrificing optimality or accuracy; and (iv) classic optimization methods [ 77 , 86 , 89 ], such as dynamic programming [ 77 ] and linear programming [ 86 , 89 ], which leverage mathematical models to solve problems directly, deriving solutions by optimizing, i.e., maximizing or minimizing, depending on the case, the objective function of interest while satisfying the set of constraints considered, and, in the particular case of dynamic programming, even simplifying the decision making by breaking it down into a sequence of decision steps over time.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

See 3 more Smart Citations

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Rodriguez-Conde

Campos

Fdez-Riverola

2023

Sensors

View full text Add to dashboard Cite

Motivated by the pervasiveness of artificial intelligence (AI) and the Internet of Things (IoT) in the current “smart everything” scenario, this article provides a comprehensive overview of the most recent research at the intersection of both domains, focusing on the design and development of specific mechanisms for enabling a collaborative inference across edge devices towards the in situ execution of highly complex state-of-the-art deep neural networks (DNNs), despite the resource-constrained nature of such infrastructures. In particular, the review discusses the most salient approaches conceived along those lines, elaborating on the specificities of the partitioning schemes and the parallelism paradigms explored, providing an organized and schematic discussion of the underlying workflows and associated communication patterns, as well as the architectural aspects of the DNNs that have driven the design of such techniques, while also highlighting both the primary challenges encountered at the design and operational levels and the specific adjustments or enhancements explored in response to them.

show abstract

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

See 2 more Smart Citations

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Rodriguez-Conde

Campos

Fdez-Riverola

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Therefore, we need to consider how to minimize the total cost of device-device collaborative inference in application scenarios to improve task performance. Goel et al [81] verify that the hierarchical DNN architecture is very suitable for parallel processing on multiple edge devices, and created a parallel inference system for computer vision problems of hierarchical DNN. The method balances the load between cooperative devices and reduces the communication cost, so as to process multiple video frames at the same time with higher throughput.…”

Section: Total Cost Minimizationmentioning

confidence: 99%

A Survey on Collaborative DNN Inference for Edge Intelligence

Ren¹,

Qu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

With the vigorous development of artificial intelligence (AI), the intelligent applications based on deep neural network (DNN) change people's lifestyles and the production efficiency. However, the huge amount of computation and data generated from the network edge becomes the major bottleneck, and traditional cloud-based computing mode has been unable to meet the requirements of real-time processing tasks. To solve the above problems, by embedding AI model training and inference capabilities into the network edge, edge intelligence (EI) becomes a cutting-edge direction in the field of AI. Furthermore, collaborative DNN inference among the cloud, edge, and end device provides a promising way to boost the EI. Nevertheless, at present, EI oriented collaborative DNN inference is still in its early stage, lacking a systematic classification and discussion of existing research efforts. Thus motivated, we have made a comprehensive investigation on the recent studies about EI oriented collaborative DNN inference. In this paper, we firstly review the background and motivation of EI. Then, we classify four typical collaborative DNN inference paradigms for EI, and analyze the characteristics and key technologies of them. Finally, we summarize the current challenges of collaborative DNN inference, discuss the future development trend and provide the future research direction.

show abstract