LACS: A High-Computational-Efficiency Accelerator for CNNs

In the field of pedestrian dead reckoning (PDR), the zero velocity update (ZUPT) method with an inertial measurement unit (IMU) is a mature technology to calibrate dead reckoning. However, due to the complex walking modes of different individuals, it is essential and challenging to determine the ZUPT conditions, which has a direct and significant influence on the tracking accuracy. In this research, we adopted an adaptive zero velocity update (AZUPT) method based on convolution neural networks to classify the ZUPT conditions. The AZUPT model was robust regardless of the different motion types of various individuals. AZUPT was then implemented on the Zynq-7000 SoC platform to work in real time to validate its computational efficiency and performance superiority. Extensive real-world experiments were conducted by 60 different individuals in three different scenarios. It was demonstrated that the proposed system could work equally well in different environments, making it portable for PDR to be widely performed in various real-world situations.

show abstract

“…(a) parallel processing: multiple multiply-add operations performed in parallel to increase the operation speed [ 37 , 38 ];…”

Section: System Architecturementioning

confidence: 99%

Real-Time Pedestrian Tracking Terminal Based on Adaptive Zero Velocity Update

Wei

Yang

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Shang et al [35] Eliminates zero padding operation by creating coordinate relationship between subsequent layers. Also it develops better optimization strategy and uses data partitioning for parallelization.…”

Section: Kyriakos Et Al [34]mentioning

confidence: 99%

“…However, it adds to extra area which is required to be added across all input tensors, also this extra zeros adds to the wastage of computational resources and memory exhaust problem specific for this application. Shang et al proposed LACS, a hardware accelerator for CNN to obtain high computational efficiency without sacrificing the accuracy [35]. LACS eliminates the operation of zero padding and zero filling by incorporating coordinate relationship between differ-ent subsequent layers of the deep CNN architecture.…”

Section: Kyriakos Et Al [34]mentioning

confidence: 99%

From DNNs to GANs: Review of efficient hardware architectures for deep learning

Bhattacharya¹

2021

Preprint

View full text Add to dashboard Cite

In recent times, the trend in very large scale integration (VLSI) industry is multi-dimensional, e.g. reduction of energy consumption, occupancy of less space, precise result, less power dissipation, faster response etc. To meet these needs, the hardware architecture should be reliable and robust to these problems. Recently, neural network and deep learning has been started to impact the present research paradigm significantly which consists of parameters in the order of millions, nonlinear function for activation, convolutional operation for feature extraction, softmax regression for classification, generative adversarial networks, etc. These operations involve huge calculation and memory overhead. Presently available DSP processors are incapable of performing these operations and they most face the problems e.g. memory overhead, performance drop and compromised accuracy. Moreover, if a huge silicon area is powered to accelerate the operation using parallel computation, the IC's will be having significant chance of burning out due to the considerable generation of heat. Hence, novel dark silicon constraint is developed to reduce the heat dissipation without sacrificing the accuracy. Similarly, different algorithms have been adapted to design a DSP processor compatible for fast performance in neural network, activation function, convolutional neural network and generative adversarial network. In this review, we illustrate the recent developments in hardware for accelerating the efficient implementation of deep learning networks with enhanced performance. The techniques investigated in this review are expected to direct future research challenges of hardware optimization for high-performance computations.

show abstract

“…The padding operation in CNN is used to fetch edge information, so that we will not lose any critical features, and has been proved that it can improve accuracy of CNN 29 . When it comes to the implementation on hardware, padding is often operated by CPU 15,30 . Though the utilization of software can be more flexible, it is not suitable for the pipeline working mode.…”

Section: Proposed Accelerator Architecturementioning

confidence: 99%

EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

Zhang

Yin

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Thanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA‐based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine‐grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2‐Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.

show abstract

LACS: A High-Computational-Efficiency Accelerator for CNNs

Cited by 8 publications

References 27 publications

Real-Time Pedestrian Tracking Terminal Based on Adaptive Zero Velocity Update

Real-Time Pedestrian Tracking Terminal Based on Adaptive Zero Velocity Update

From DNNs to GANs: Review of efficient hardware architectures for deep learning

EPA: The effective pipeline architecture for CNN accelerator with high performance and computing efficiency based on FPGA

Contact Info

Product

Resources

About