Jupiter

Ghosh, Pradipta; Nguyễn, Quỳnh; Sakulkar, Pranav; Tran, Jason A.; Knezevic, Aleksandra; Wang, Jiatong; Lin, Zhifeng; Krishnamachari, Bhaskar; Annavaram, Murali; Avestimehr, Salman

doi:10.1145/3492323.3495630

Cited by 4 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Intermittent edge inference [21] describes how to optimize edge inference for energy use on edge devices, but focuses on compression and pruning of model layers with a specialized inference runtime. Jupiter [20] orchestrates execution of a task on geographically distributed compute nodes based on a given task graph. This framework takes compute time as the bottleneck and uses a dynamic-programming solution to minimize computation time when distributing the task graph.…”

Section: Edge Inferencementioning

confidence: 99%

“…For each model, we ran Algorithm 3 with a certain number of nodes, number of bandwidth classes, and node memory capacity. We used the set of nodes [5,10,15,20,50]. We used the set of bandwidth classes [2,5,8,11,14,17,20].…”

Section: Algorithm Simulationsmentioning

confidence: 99%

“…We used the set of nodes [5,10,15,20,50]. We used the set of bandwidth classes [2,5,8,11,14,17,20]. We used the set of node memory capacities [64,128,256,512].…”

Section: Algorithm Simulationsmentioning

confidence: 99%

See 2 more Smart Citations

Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput

Parthasarathy¹,

Krishnamachari

2022

2022 32nd International Telecommunication Networks and Applications Conference (ITNAC)

View full text Add to dashboard Cite

Edge inference has become more widespread, as its diverse applications range from retail to wearable technology. Clusters of networked resource-constrained edge devices are becoming common, yet no system exists to split a DNN across these clusters while maximizing the inference throughput of the system. Additionally, no production-ready orchestration system exists for deploying said models over such edge networks which adopts the robustness and scalability of the cloud. We present an algorithm which partitions DNNs and distributes them across a set of edge devices with the goal of minimizing the bottleneck latency and therefore maximizing inference throughput. The system scales well to systems of different node memory capacities and numbers of nodes, while being node fault-tolerant. We find that we can reduce the bottleneck latency by 10x over a random algorithm and 35% over a greedy joint partitioning-placement algorithm, although the joint-partitioning algorithm outperforms our algorithm in most practical use-cases.Furthermore we find empirically that for the set of representative models we tested, the algorithm produces results within 9.2% of the optimal bottleneck latency. We then developed a standalone cluster network emulator on which we tested configurations of up to 20 nodes and found a steady increase in throughput and decrease in end-to-end latency as the cluster size scales. In these tests, we observed that our system has multi-node fault-tolerance as well as network and system IO fault-tolerance. We have implemented our framework in open-source software that is publicly available to the research community at https://github.com/ANRGUSC/SEIFER.

show abstract

Section: Edge Inferencementioning

confidence: 99%