Compressing Neural Networks With Inter Prediction and Linear Transformation

Lee, Kang-Ho; Bae, Sung Ho

doi:10.1109/access.2021.3077596

Cited by 4 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From there, AGB as the AXI master generates signals for AXI4 burst transmission according to Eq. ( 7)- (9), where Burst len is 1, . .…”

Section: Accelerator Architecturementioning

confidence: 99%

“…Most existing FPGA-based CNN inference implementations use DDR, which has a narrow bandwidth, for the offchip memory. Compression mechanisms [9], [10] or quantization [11]- [13] have been applied for low numerical precision to reduce the off-chip memory bandwidth pressure. In addition, using on-chip buffers (e.g., [14]- [16]) is a popular solution for limiting bandwidth memory.…”

Section: Introductionmentioning

confidence: 99%

“…Fig 9. The proposed CNN inference accelerator results in small and large scale CNN model: Throughput (GOP/s) (left) and power efficiency (GOP/s/W) (right).…”

mentioning

confidence: 99%

See 2 more Smart Citations

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Nguyen

NAKASHIMA

2023

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.

show abstract

“…From there, AGB as the AXI master generates signals for AXI4 burst transmission according to Eq. ( 7)- (9), where Burst len is 1, . .…”

Section: Accelerator Architecturementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Nguyen

NAKASHIMA

2023

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

Fault diagnosis for PV system using a deep learning optimized via PSO heuristic combination technique

2023

View full text Add to dashboard Cite

A heuristic particle swarm optimization combined with Back Propagation Neural Network (BPNN-PSO) technique is proposed in this paper to improve the convergence and the accuracy of prediction for fault diagnosis of Photovoltaic (PV) array system. This technique works by applying the ability of deep learning for classification and prediction combined with the particle swarm optimization ability to find the best solution in the search space. Some parameters are extracted from the output of the PV array to be used for identification purpose for the fault diagnosis of the system. The results using the back propagation neural network method only and the method of the back propagation heuristic combination technique are compared. The back propagation algorithm converges after 350 steps while the proposed BP-PSO algorithm converges only after 250 steps in the training phase. The accuracy of prediction using the BP algorithms is about 87.8% while the proposed BP-PSO algorithm achieved 95% of right predictions. It was clearly shown that the results of the back propagation heuristic combination technique had better results in the convergence of the simulation as well as in the accuracy of the prediction of the fault diagnosis in the PV system.

show abstract

Optimized Inter Prediction Algorithm of High Efficiency Video Coding Based on Region of Interest

Zhang

2021

2021 International Conference on Networking, Communications and Information Technology (NetCIT)

View full text Add to dashboard Cite

Compressing Neural Networks With Inter Prediction and Linear Transformation

Cited by 4 publications

References 24 publications

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

Fault diagnosis for PV system using a deep learning optimized via PSO heuristic combination technique

Optimized Inter Prediction Algorithm of High Efficiency Video Coding Based on Region of Interest

Contact Info

Product

Resources

About