Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure

Du, Jiangsu; Zhu, Xin; Shen, Minghua; Du, Yunfei; Lu, Yutong; Xiao, Nong; Liao, Xiangke

doi:10.1109/tpds.2020.3041474

Cited by 21 publications

(17 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Model parallelism [ 70 , 71 , 72 , 73 , 74 , 75 , 76 , 78 , 80 , 81 , 83 , 84 , 86 , 87 , 89 ] attempts to address both issues by employing partitioning strategies with a finer granularity to produce less expensive DNN subtasks, i.e., partitions with fewer parameters and fewer computation requirements than a layer, and foster more adaptable co-inference schemes. The computations required for a single input are distributed across multiple computing entities, reducing the time needed to process the shared input [ 76 ] but delivering a performance that, as opposed to what has been indicated for pipeline parallelism, is highly dependent on the distribution of such computations across devices.…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

“…This has been mainly motivated by the very nature and the subsequent memory and computing demands of the different layer types considered. Analyses carried out in this context [ 70 , 76 , 83 ] have confirmed that, while convolutional and fully connected layers account for the majority of the memory footprint and computational complexity of CNN models, the effect of non-parametric layers, i.e., pooling layers and activation functions, in this regard and, therefore, their contribution to the overall computational cost of the network, can be considered negligible [ 71 ]. Thus, when partitioned, the non-parametric layers are typically grouped with their corresponding parent layer, i.e., the layer that generated their input, instead of receiving explicit treatment [ 76 ].…”

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

See 1 more Smart Citation

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Rodriguez-Conde

Campos

Fdez-Riverola

2023

Sensors

View full text Add to dashboard Cite

Motivated by the pervasiveness of artificial intelligence (AI) and the Internet of Things (IoT) in the current “smart everything” scenario, this article provides a comprehensive overview of the most recent research at the intersection of both domains, focusing on the design and development of specific mechanisms for enabling a collaborative inference across edge devices towards the in situ execution of highly complex state-of-the-art deep neural networks (DNNs), despite the resource-constrained nature of such infrastructures. In particular, the review discusses the most salient approaches conceived along those lines, elaborating on the specificities of the partitioning schemes and the parallelism paradigms explored, providing an organized and schematic discussion of the underlying workflows and associated communication patterns, as well as the architectural aspects of the DNNs that have driven the design of such techniques, while also highlighting both the primary challenges encountered at the design and operational levels and the specific adjustments or enhancements explored in response to them.

show abstract

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Section: Dnn Partitioning and Parallelism For Collaborative Inferencementioning

confidence: 99%

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Rodriguez-Conde

Campos

Fdez-Riverola

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…In addition, it impacts various performance metrics such as end to end latency, communication size, throughput, accuracy, memory footprint, and energy consumption. Existing research focus on distributed CNN inference on resource‐constrained IoT devices 5‐10 to reduce latency, memory footprint, and communication size by using various approaches such as partial data upload to cloud, training at cloud ‐ inference at the edge, distributed training in edge‐cloud, or fog‐cloud training. Our extensive survey on DL at edge devices has identified the following limitations.…”

Section: Introductionmentioning

confidence: 99%

“…In addition, it impacts various performance metrics such as end to end latency, communication size, throughput, accuracy, memory footprint, and energy consumption. Existing research focus on distributed CNN inference on resource-constrained IoT devices [5][6][7][8][9][10] to reduce (2) Require enormous computational resources such as processing capabilities and memory footprint for implementing intelligent edge systems. (3) Necessity of using compression techniques to optimize the existing approaches to best utilize the available resources for successful implementation of edge intelligent systems.…”

Section: Introductionmentioning

confidence: 99%

Memory optimization at Edge for Distributed Convolution Neural Network

Naveen

Kounte

2022

Trans Emerging Tel Tech

View full text Add to dashboard Cite

Internet of Things (IoT) edge intelligence has emerged by optimizing the deep learning (DL) models deployed on resource‐constraint devices for quick decision‐making. In addition, edge intelligence reduces network overload and latency by bringing intelligent analytics closer to the source. On the other hand, DL models need a lot of computing resources. As a result, they have high computational workloads and memory footprint, making it impractical to deploy and execute on IoT edge devices with limited capabilities. In addition, existing layer‐based partitioning methods generate many intermediate results, resulting in a huge memory footprint. In this article, we propose a framework to provide a comprehensive solution that enables the deployment of convolutional neural networks (CNNs) onto distributed IoT devices for faster inference and reduced memory footprint. This framework considers a pretrained YOLOv2 model, and a weight pruning technique is applied to the pre‐trained model to reduce the number of non‐contributing parameters. We use the fused layer partitioning method to vertically partition the fused layers of the CNN and then distribute the partition among the edge devices to process the input. In our experiment, we have considered multiple Raspberry Pi as edge devices. Raspberry Pi with a neural computing stick is a gateway device to combine the results from various edge devices and get the final output. Our proposed model achieved inference latency of 5 to ∼$$ \sim $$7 seconds for 3prefix×3$$ 3\times 3 $$ to 5prefix×5$$ 5\times 5 $$ fused layer partitioning for five devices with a 9% improvement in memory footprint.

show abstract

“…Even though methods such as model parallelism can be used to split the model between multiple GPUs during both the training [14], [15] and inference [16] phases, and thus avoid memory and latency issues, these methods require a large amount of resources, such as a large number of GPUs and servers, which can incur high costs, especially when working with extreme resolutions such as gigapixel images. Furthermore, in many applications, such as self-driving cars and drone image processing, there is a limit for the hardware that can be mounted, and offloading the computation to external servers is not always possible because of unreliability of the network connection due to movement and the timecritical nature of the application.…”

Section: Introductionmentioning

confidence: 99%

Efficient High-Resolution Deep Learning: A Survey

Bakhtiarnia¹,

Zhang²,

Iosifidis³

2022

Preprint

View full text Add to dashboard Cite

Cameras in modern devices such as smartphones, satellites and medical equipment are capable of capturing very high resolution images and videos. Such high-resolution data often need to be processed by deep learning models for cancer detection, automated road navigation, weather prediction, surveillance, optimizing agricultural processes and many other applications. Using high-resolution images and videos as direct inputs for deep learning models creates many challenges due to their high number of parameters, computation cost, inference latency and GPU memory consumption. Simple approaches such as resizing the images to a lower resolution are common in the literature, however, they typically significantly decrease accuracy. Several works in the literature propose better alternatives in order to deal with the challenges of high-resolution data and improve accuracy and speed while complying with hardware limitations and time restrictions. This survey describes such efficient high-resolution deep learning methods, summarizes realworld applications of high-resolution deep learning, and provides comprehensive information about available high-resolution datasets.

show abstract

Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure

Cited by 21 publications

References 22 publications

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoT

Memory optimization at Edge for Distributed Convolution Neural Network

Efficient High-Resolution Deep Learning: A Survey

Contact Info

Product

Resources

About