2022 25th Euromicro Conference on Digital System Design (DSD) 2022
DOI: 10.1109/dsd57027.2022.00048
|View full text |Cite
|
Sign up to set email alerts
|

PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…The large-scale DNN deployment and inference on the heterogeneous edge and network was designed by Hu et al 16 It uses the dynamic programming to find the optimal partition of the model using the pipeline parallelism. Furthermore, Cai et al 17 presented ParaTra which is a transformer-based model inference that works on edge devices that often have limited GPU resources.…”
Section: Related Workmentioning
confidence: 99%
“…The large-scale DNN deployment and inference on the heterogeneous edge and network was designed by Hu et al 16 It uses the dynamic programming to find the optimal partition of the model using the pipeline parallelism. Furthermore, Cai et al 17 presented ParaTra which is a transformer-based model inference that works on edge devices that often have limited GPU resources.…”
Section: Related Workmentioning
confidence: 99%
“…Testing workloads Partitioning method # devices [18] Raspberry Pi 3B (DarkNet) YOLOv2 Fused Tile partitioning 1-6 [33] LG Nexus 5 (MxNet) VGG-16 Biased one-dimensional partition 2-4 [19] Raspberry Pi 3B+ (DarkNet) VGG-16, YOLO Fused-layer parallelization 1-8 [30] MinnowBoard, RCC-VE Network Board (PyTorch) ViT-Base, ViT-Large, ViT-Huge Layer-level splitting 16 [31] Raspberry Pi 3B + Jetson TX2 (Keras-TensorFlow) as TensorFlow, Keras, MXNet, PyTorch, and DarkNet. In addition, TVM includes auto-tuner tools -i.e., autoTVM [37] and Ansor [38] -to automatically apply graph-level and operator-level optimizations -e.g., operation fusion or data transformations -to network graphs, generating highly efficient machine code.…”
Section: B Motivationmentioning
confidence: 99%
“…Lowcost low-energy devices, such as Raspberry Pi and Odroid, have also been employed in edge-cloud hybrid solutions with dynamic network conditions [15], [29]. Multi-node edgeedge solutions can also optimize network execution by applying layer fusion, data parallelism, or network partition over clusters of edge devices [18], [19], [30]- [33]. For instance, [34] proposes spatial and channel partitioning to parallelize convolutional layers in multiple devices using dynamic programming-based search.…”
Section: Introductionmentioning
confidence: 99%