An Efficient CNN Architecture for Image Classification on FPGA Accelerator

Mujawar, Shahmustafa; Kiran, Divya; Ramasangu, Hariharan

doi:10.1109/icaecc.2018.8479517

Cited by 14 publications

(9 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 8 compares our best architecture (Conv mixed-24) with existing works, which confirms that our architecture can substantially reduce hardware resources than the existing FGPA accelerators [ 28 , 33 , 34 , 35 ].…”

Section: Results and Analysissupporting

confidence: 61%

“…The energy consumption per image in the proposed accelerator is only 8.5 uJ, while it is 17.4 uJ in our previous accelerator [ 33 ]. Our energy per image is 1140, 81, and 555 times lower than the previous works [ 34 ], [ 28 ] and [ 35 ], respectively.…”

Section: Results and Analysismentioning

confidence: 54%

“…Floating-point operation demonstrates superior accuracy over fixed-point operation when employed in training neural networks [ 25 , 26 ]. Conventional neural network circuit design studies have been conducted using floating-point operations provided by GPUs or fixed-point computation hardware [ 27 , 28 ]. However, most of the existing floating-point-based neural networks are limited to inference operation, and only a few incorporate training engines that are aimed at high-speed servers, not low-power mobile devices.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors

Junaid

Arslan

Lee

et al. 2022

Sensors

View full text Add to dashboard Cite

The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 × 1.036 mm2 and energy consumption of 4.445 µJ per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy.

show abstract

Section: Results and Analysissupporting

confidence: 61%

Section: Results and Analysismentioning

confidence: 54%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors

Junaid

Arslan

Lee

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…When the image resolution is higher, a common practice is to split the image into sub-blocks and use the sub-images for training the CNN model to perform defect inspection in metal AM [ 11 ]. The 200 × 200 image size used in our paper is comparable to literature with CPU-GPU approaches, and is considered the state-of-the-art comparing to other FPGA-based implementations (e.g., 28 × 28 in [ 16 ], 32 × 32 and 48 × 48 in [ 17 ]). Zhu et al [ 9 ] applied images with the size of 120 × 80 pixels to train the CNN for classification of weld surface defects.…”

Section: Research Methodsmentioning

confidence: 99%

“…Mujawar et al [ 16 ] proposed a 3-layer CNN architecture targeting at written digits recognition application on the MNIST dataset and implemented it in Artix-7 FPGAs. The authors also optimized the architecture by using loop-level parallel processing.…”

Section: Introductionmentioning

confidence: 99%

FPGA-Based Acceleration on Additive Manufacturing Defects Inspection

Luo

Chen

2021

Sensors

View full text Add to dashboard Cite

Additive manufacturing (AM) has gained increasing attention over the past years due to its fast prototype, easier modification, and possibility for complex internal texture devices when compared to traditional manufacture processing. However, potential internal defects are occurring during AM processes, and it requires real-time inspections to minimize the costs by either aborting the processing or repairing the defect. In order to perform the defects inspection, first the defects database NEU-DET is used for training. Then, a convolution neural network (CNN) is applied to perform defects classification. For real-time purposes, Field Programmable Gate Arrays (FPGAs) are utilized for acceleration. A binarized neural network (BNN) is proposed to best fit the FPGA bit operations. Finally, for the image labeled with defects, the selective search and non-maximum algorithms are implemented to help locate the coordinates of defects. Experiments show that the BNN model on NEU-DET can achieve 97.9% accuracy in identifying whether the image is defective or defect-free. As for the image classification speed, the FPGA-based BNN module can process one image within 0.5 s. The BNN design is modularized and can be duplicated in parallel to fully utilize logic gates and memory resources in FPGAs. It is clear that the proposed FPGA-based BNN can perform real-time defects inspection with high accuracy and it can easily scale up to larger FPGA implementations.

show abstract