Akshay Dua scite author profile

Ren

2020

Object detection has gained great progress driven by the development of deep learning. Compared with a widely studied task -classification, generally speaking, object detection even need one or two orders of magnitude more FLOPs (floating point operations) in processing the inference task. To enable a practical application, it is essential to explore effective runtime and accuracy trade-off scheme. Recently, a growing number of studies are intended for object detection on resource constraint devices, such as YOLOv1, YOLOv2, SSD, MobileNetv2-SSDLite [11,14,16], whose accuracy on COCO test-dev [8] detection results are yield to mAP around 22-25% (mAP-20-tier). On the contrary, very few studies discuss the computation and accuracy trade-off scheme for mAP-30-tier detection networks. In this paper, we illustrate the insights of why Reti-naNet gives effective computation and accuracy trade-off for object detection and how to build a light-weight RetinaNet. We propose to only reduce FLOPs in computational intensive layers and keep other layer the same. Compared with most common wayinput image scaling for FLOPs-accuracy trade-off, the proposed solution shows a constantly better FLOPs-mAP trade-off line. Quantitatively, the proposed method result in 0.1% mAP improvement at 1.15x FLOPs reduction and 0.3% mAP improvement at 1.8x FLOPs reduction.

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

Ren

2020

This paper presents Systolic-CNN, an OpenCL-defined scalable, runtime-flexible FPGA accelerator architecture, optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing. The existing OpenCLdefined FPGA accelerators for CNN inference are insufficient due to limited flexibility for supporting multiple CNN models at run time and poor scalability resulting in underutilized FPGA resources and limited computational parallelism. Systolic-CNN adopts a highly pipelined and paralleled 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. Systolic-CNN is highly scalable and parameterized, which can be easily adapted by users to achieve up to 100% utilization of the coarse-grained computation resources (i.e., DSP blocks) for a given FPGA. Systolic-CNN is also run-time-flexible in the context of multi-tenancy cloud/edge computing, which can be time-shared to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria/Stratix 10 GX FPGA Development board show that the optimized single-precision implementation of Systolic-CNN can achieve an average inference latency of 7ms/2ms, 84ms/33ms, 202ms/73ms, 1615ms/873ms, and 900ms/498ms per image for accelerating AlexNet, ResNet-50, ResNet-152, RetinaNet, and Lightweight RetinaNet, respectively. Codes are available at https://github.com/PSCLab-ASU/Systolic-CNN. CCS CONCEPTS • Hardware → Hardware accelerators; • Computer systems organization → Neural networks.

Edge Computing Accelerated Defect Classification Based on Deep Convolutional Neural Network With Application in Rolling Image Inspection

Huang

Sergin

et al. 2020

This paper develops a unified framework for training and deploying deep neural networks on the edge computing framework for image defect detection and classification. In the proposed framework, we combine the transfer learning and data augmentation with the improved accuracy given the small sample size. We further implement the edge computing framework to satisfy the real-time computational requirement. After the implement of the proposed model into a rolling manufacturing system, we conclude that deep learning approaches can perform around 30–40% better than some traditional machine learning algorithms such as random forest, decision tree, and SVM in terms of prediction accuracy. Furthermore, by deploying the CNNs in the edge computing framework, we can significantly reduce the computational time and satisfy the real-time computational requirement in the high-speed rolling and inspection system. Finally, the saliency map and embedding layer visualization techniques are used for a better understanding of proposed deep learning models.

Combating Software and Sybil Attacks to Data Integrity in Crowd-Sourced Embedded Systems

ACM Trans. Embed. Comput. Syst.

Bulusu

Feng

et al. 2014

Crowd-sourced mobile embedded systems allow people to contribute sensor data, for critical applications, including transportation, emergency response and eHealth. Data integrity becomes imperative as malicious participants can launch software and Sybil attacks modifying the sensing platform and data. To address these attacks, we develop (i) a Trusted Sensing Peripheral (TSP) enabling collection of high-integrity raw or aggregated data, and participation in applications requiring additional modalities; and (ii) a Secure Tasking and Aggregation Protocol (STAP) enabling aggregation of TSPs trusted readings by untrusted intermediaries, while efficiently detecting fabricators. Evaluations demonstrate that TSP and STAP are practical and energy-efficient.