DEFER: Distributed Edge Inference for Deep Neural Networks

Parthasarathy, Arjun; Krishnamachari, Bhaskar

doi:10.1109/comsnets53615.2022.9668515

Cited by 15 publications

(22 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each worker node is mapped with only some of the partitions, which are processed and then reduced back to the central node, thus generating the input for the next layer considered in the Map procedure. The pipelined architecture [ 77 , 79 , 85 , 91 ] conceives the workflow as a sequence of n computation stages—corresponding to the n nodes in the hardware infrastructure—and n-1 communication steps for transferring intermediate results between adjacent devices [ 77 ]. Both the computation nodes and the execution flow are typically predetermined at configuration time, thereby simplifying task assignment into a mere sequential ordering, circumscribing the pursued objectives to find the split points that optimize the performance of the exploited CNN deployed.…”

Section: In Situ Distributed Intelligencementioning

confidence: 99%

“…Backed, in terms of the communication, by the pipelined architecture introduced in the previous section, pipeline parallelism [ 77 , 79 , 82 , 85 , 88 , 90 , 91 ] constitutes the simplest way to distribute the inference workload. It is a parallelism modality inherent to the traditional chain-like architecture of DNNs, which typically consists of a sequence of layers in which each layer’s output is dependent on the output provided by its previous layers.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

“…However, the communication overhead caused by the need to extend each region with elements overlapping adjacent areas to compute convolution at the edges, aggravated by the synchronization overhead caused by DNN model’s inter-layer data dependency and the subsequent need to process them layer by layer, prompted the emergence of new approaches in search of possible solutions to both challenges. As a result, studies such as [ 70 , 80 , 86 , 87 , 91 ] developed alternative one-dimensional partitioning methods aimed at reducing the number of neighbors, and thus the overlapped areas, by splitting the input along a single spatial dimension—typically the longer one [ 70 , 80 , 87 ]—in the form of strips. Furthermore, although still based on the same 2D grid-based partitioning, research specifically focused on the synchronization problem [ 73 , 84 ] opted to fuse multiple layers, dividing the original DNN into tiled stacks in which most of the computations are performed locally, i.e., within the fused block, and only communication of the input of the first layer and the output of the last layer is required between devices.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

“…In contrast, communication overhead arises in the studied corpus as the primary focus of interest among authors as far as the optimization required to achieve such performance levels is concerned, resulting in a greater variety of methods, as evidenced by the relatively important number of references collected in Table 4 in this regard. In addition to the strategies outlined in Section 4.1 for the design of more complex partitioning schemes—specifically, layer fusion [ 73 , 80 , 84 ] and inter-channel partitioning [ 74 , 83 ]—these methods include orthogonal optimization techniques, which are either oriented towards the improvement in operational aspects of DNN distribution and collaborative inference, in particular, scheduling strategies [ 73 , 74 , 77 , 83 , 87 ], such as the partitioning schemes just referred to, or oriented to the tailoring or adjustment of the exploited DNNs themselves, namely, DNN structure tailoring [ 71 , 74 , 75 , 77 , 83 , 84 ] and compression methods [ 81 , 91 ].…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

See 3 more Smart Citations

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Rodriguez-Conde

Campos

Fdez-Riverola

2023

Sensors

View full text Add to dashboard Cite

Motivated by the pervasiveness of artificial intelligence (AI) and the Internet of Things (IoT) in the current “smart everything” scenario, this article provides a comprehensive overview of the most recent research at the intersection of both domains, focusing on the design and development of specific mechanisms for enabling a collaborative inference across edge devices towards the in situ execution of highly complex state-of-the-art deep neural networks (DNNs), despite the resource-constrained nature of such infrastructures. In particular, the review discusses the most salient approaches conceived along those lines, elaborating on the specificities of the partitioning schemes and the parallelism paradigms explored, providing an organized and schematic discussion of the underlying workflows and associated communication patterns, as well as the architectural aspects of the DNNs that have driven the design of such techniques, while also highlighting both the primary challenges encountered at the design and operational levels and the specific adjustments or enhancements explored in response to them.

show abstract

Section: In Situ Distributed Intelligencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

See 2 more Smart Citations

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Rodriguez-Conde

Campos

Fdez-Riverola

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Jouhari et al [14], in order to achieve inference of complex DNN models by unmanned aerial vehicles (UAVs) while avoiding air-and ground-generated additional communication delays, proposed a method for a dynamic collaborative DNN model inference by UAV air-to-air communication, which improved the real-time DNN inference while effectively utilizing the storage and computational resources of UAVs. DEFER [15] proposed a distributed edge inference framework to partition the model and perform distributed inference on resource-constrained devices, effectively reducing the device energy consumption.…”

Section: D2d Inferencementioning

confidence: 99%

Inference Acceleration with Adaptive Distributed DNN Partition over Dynamic Video Stream

Cao

Fan

et al. 2022

Algorithms

View full text Add to dashboard Cite

Deep neural network-based computer vision applications have exploded and are widely used in intelligent services for IoT devices. Due to the computationally intensive nature of DNNs, the deployment and execution of intelligent applications in smart scenarios face the challenge of limited device resources. Existing job scheduling strategies are single-focused and have limited support for large-scale end-device scenarios. In this paper, we present ADDP, an adaptive distributed DNN partition method that supports video analysis on large-scale smart cameras. ADDP applies to the commonly used DNN models for computer vision and contains a feature-map layer partition module (FLP) supporting edge-to-end collaborative model partition and a feature-map size partition (FSP) module supporting multidevice parallel inference. Based on the inference delay minimization objective, FLP and FSP achieve a tradeoff between the arithmetic and communication resources of different devices. We validate ADDP on heterogeneous devices and show that both the FLP module and the FSP module outperform existing approaches and reduce single-frame response latency by 10–25% compared to the pure on-device processing.

show abstract