“…These do not account for communication demands on the edge. Others abstract model layers into certain "execution units, " [10,29] which they then choose to slice based on certain resource requirements. Li et al [28] regressively predict a layer's latency demand and optimizes communication bandwidth accordingly.…”
Section: Edge Inferencementioning
confidence: 99%
“…For each model, we ran Algorithm 3 with a certain number of nodes, number of bandwidth classes, and node memory capacity. We used the set of nodes [5,10,15,20,50]. We used the set of bandwidth classes [2,5,8,11,14,17,20].…”
Edge inference has become more widespread, as its diverse applications range from retail to wearable technology. Clusters of networked resource-constrained edge devices are becoming common, yet no system exists to split a DNN across these clusters while maximizing the inference throughput of the system. Additionally, no production-ready orchestration system exists for deploying said models over such edge networks which adopts the robustness and scalability of the cloud. We present an algorithm which partitions DNNs and distributes them across a set of edge devices with the goal of minimizing the bottleneck latency and therefore maximizing inference throughput. The system scales well to systems of different node memory capacities and numbers of nodes, while being node fault-tolerant. We find that we can reduce the bottleneck latency by 10x over a random algorithm and 35% over a greedy joint partitioning-placement algorithm, although the joint-partitioning algorithm outperforms our algorithm in most practical use-cases.Furthermore we find empirically that for the set of representative models we tested, the algorithm produces results within 9.2% of the optimal bottleneck latency. We then developed a standalone cluster network emulator on which we tested configurations of up to 20 nodes and found a steady increase in throughput and decrease in end-to-end latency as the cluster size scales. In these tests, we observed that our system has multi-node fault-tolerance as well as network and system IO fault-tolerance. We have implemented our framework in open-source software that is publicly available to the research community at https://github.com/ANRGUSC/SEIFER.
“…These do not account for communication demands on the edge. Others abstract model layers into certain "execution units, " [10,29] which they then choose to slice based on certain resource requirements. Li et al [28] regressively predict a layer's latency demand and optimizes communication bandwidth accordingly.…”
Section: Edge Inferencementioning
confidence: 99%
“…For each model, we ran Algorithm 3 with a certain number of nodes, number of bandwidth classes, and node memory capacity. We used the set of nodes [5,10,15,20,50]. We used the set of bandwidth classes [2,5,8,11,14,17,20].…”
Edge inference has become more widespread, as its diverse applications range from retail to wearable technology. Clusters of networked resource-constrained edge devices are becoming common, yet no system exists to split a DNN across these clusters while maximizing the inference throughput of the system. Additionally, no production-ready orchestration system exists for deploying said models over such edge networks which adopts the robustness and scalability of the cloud. We present an algorithm which partitions DNNs and distributes them across a set of edge devices with the goal of minimizing the bottleneck latency and therefore maximizing inference throughput. The system scales well to systems of different node memory capacities and numbers of nodes, while being node fault-tolerant. We find that we can reduce the bottleneck latency by 10x over a random algorithm and 35% over a greedy joint partitioning-placement algorithm, although the joint-partitioning algorithm outperforms our algorithm in most practical use-cases.Furthermore we find empirically that for the set of representative models we tested, the algorithm produces results within 9.2% of the optimal bottleneck latency. We then developed a standalone cluster network emulator on which we tested configurations of up to 20 nodes and found a steady increase in throughput and decrease in end-to-end latency as the cluster size scales. In these tests, we observed that our system has multi-node fault-tolerance as well as network and system IO fault-tolerance. We have implemented our framework in open-source software that is publicly available to the research community at https://github.com/ANRGUSC/SEIFER.
The paper focuses on the challenges associated with deploying deep neural networks (DNNs) for the recognition of traffic objects using the camera of Android smartphones. The main objective of this research is to achieve resource-awareness, enabling efficient utilization of computational resources while maintaining high recognition accuracy. To achieve this, a methodology is proposed that leverages the Edge-to-Fog paradigm to distribute the inference workload across multiple tiers of the distributed system architecture. The evaluation was conducted using a dataset comprising real-world traffic scenarios and diverse traffic objects. The main findings of this research highlight the feasibility of deploying DNNs for traffic object recognition on resource-constrained Android smartphones. The proposed Edge-to-Fog methodology demonstrated improvements in terms of both recognition accuracy and resource utilization, and viability of both edge-only and edge-fog based approaches. Moreover, the experimental results showcased the adaptability of the system to dynamic traffic scenarios, thus ensuring real-time recognition performance even in challenging environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.