PerDNN: Offloading Deep Neural Network Computations to Pervasive Edge Servers

Jeong, Hyuk-Jin; Lee, Hyeon-Jae; Shin, Kwang Yong; Yoo, Yong Hwan; Moon, Soo-Mook

doi:10.1109/icdcs47774.2020.00114

Cited by 14 publications

(10 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reducto [45] uses an on-camera filtering technique to filter out frames that do not contain relevant information for the query. Partition-based methods [38,40,78] explore partitioning the DNNs over the edge and cloud to fully utilize the computation resources on both sides. They automatically divide a DNN model into two partitions and deploy the few initial layers to improve inference efficiency.…”

Section: Related Workmentioning

confidence: 99%

2D-Empowered 3D Object Detection on the Edge

Li¹,

Cai²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

3D object detection has a pivotal role in a wide range of applications, most notably autonomous driving and robotics. These applications are commonly deployed on edge devices to promptly interact with the environment, and often require near real-time response. With limited computation power, it is challenging to execute 3D detection on the edge using highly complex neural networks. Common approaches such as offloading to the cloud brings latency overheads due to the large amount of 3D point cloud data during transmission. To resolve the tension between wimpy edge devices and compute-intensive inference workloads, we explore the possibility of transforming fast 2D detection results to extrapolate 3D bounding boxes. To this end, we present Moby, a novel system that demonstrates the feasibility and potential of our approach. Our main contributions are two-fold: First, we design a 2D-to-3D transformation pipeline that takes as input the point cloud data from LiDAR and 2D bounding boxes from camera that are captured at exactly the same time, and generate 3D bounding boxes efficiently and accurately based on detection results of the previous frames without running 3D detectors. Second, we design a frame offloading scheduler that dynamically launches a 3D detection when the error of 2D-to-3D transformation accumulates to a certain level, so the subsequent transformations can draw upon the latest 3D detection results with better accuracy. Extensive evaluation on NVIDIA Jetson TX2 with the autonomous driving dataset KITTI and real-world 4G/LTE traces shows that, Moby reduces the end-to-end latency by up to 91.9% with mild accuracy drop compared to baselines. Further, Moby shows excellent energy efficiency by saving power consumption and memory footprint up to 75.7% and 48.1%, respectively. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing systems and tools; • Computer systems organization → Robotics.

show abstract

Section: Related Workmentioning

confidence: 99%

2D-Empowered 3D Object Detection on the Edge

Li¹,

Cai²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…It partitions DNN models based on a penalty factor to reduce the uploading overhead; and it is the most popular approach evaluated in practice. e third is PerDNN [23]: it is a recent work on DNN offloading. It uses the GPU statistics of servers to partition DNN models to minimize the execution latency between the client and the edge server.…”

Section: Workloadsmentioning

confidence: 99%

“…(1) In order to completely push the computing tasks of DNN applications to the edge of the network, after studying the challenges that need to be overcome to execute DNN applications in collaboration with multiple edge nodes, we found some problems that may delay the execution of DNN applications, such as concurrency conflict exceptions, network jitters, and deadlocks. [10][11][12][17][18][19][20][21][22][23][24] offload all (i.e., cloud-only) or part (i.e., edgecloud collaboration) of the DNN computation to the cloud server. Neurosurgeon [10] dynamically divides the DNN model into the front part and the rear part.…”

Section: Introductionmentioning

confidence: 99%

“…Wang et al [22] described the computation offloading problem in the multiaccess edge environment as a deep sequential model based on reinforcement learning, which can automatically discover the common patterns behind various applications. In order to minimize the execution delay between the client and the edge server, PerDNN [23] uses the server's GPU statistics to split DNN models. Xu et al [24] focus on offloading DNNs based on one or more cloudlets to minimize the energy consumption or maximize the inference requests.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Conflict-Resilient Incremental Offloading of Deep Neural Networks to the Edge of Smart Environment

Chen

Wan

et al. 2021

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Novel smart environments, such as smart home, smart city, and intelligent transportation, are driving increasing interest in deploying deep neural networks (DNN) in edge devices. Unfortunately, deploying DNN at resource-constrained edge devices poses a huge challenge. These workloads are computationally intensive. Moreover, the edge server-based approach may be affected by incidental factors, such as network jitters and conflicts, when multiple tasks are offloaded to the same device. A rational workload scheduling for smart environments is highly desired. In this work, we propose a Conflict-resilient Incremental Offloading of Deep Neural Networks at Edge (CIODE) for improving the efficiency of DNN inference in the edge smart environment. CIODE divides the DNN model into several partitions by layer and incrementally uploads them to local edge nodes. We design a waiting lock-based scheduling paradigm to choose edge devices for DNN layers to be offloaded. In detail, an advanced lock mechanism is proposed to handle concurrency conflicts. Real-world testbed-based experiments demonstrate that, compared with other state-of-the-art baselines, CIODE outperforms the DNN inference performance of these popular baselines by 20 % to 70 % and significantly improves the robustness under the insight of neighboring collaboration.

show abstract

“…In the literature, some recent works exploit MEC for computational offloading of DNN tasks under a single WD setup [11]- [21] or a multi-WD setup [21], [22]. The DNN task offloading and resource allocation schemes are designed to optimize the WD's energy consumption [12]- [14], the DNN inferencing accuracy [15], [16], and the DNN inferencing time [17]- [20]. The profiling knowledge of layer-wise DNN inferencing delay/energy consumption, which heavily depends on the MEC system parameters, is determined in either an offline manner [12]- [18] or by an online learning approach [19], [20].…”

Section: Introductionmentioning

confidence: 99%