Deep Neural Network Hardware Implementation Based on Stacked Sparse Autoencoder

Coutinho, Maria G. F.; Torquato, Matheus F.; Fernandes, Marcelo A. C.

doi:10.1109/access.2019.2907261

Cited by 37 publications

(44 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This therefore allows applications to achieve real-time or near real-time processing. The FPGA allows the exploitation of the algorithm parallelization and the development of dedicated hardware to obtain performance improvement [9][10][11][12][13][14][15]. However, FPGA implementations found in the literature are often developed with sequential processing schemes in some stages of the Otsu algorithm, limiting the hardware's processing speed [16][17][18][19][20][21].…”

Section: Introductionmentioning

confidence: 99%

Fully Parallel Implementation of Otsu Automatic Image Thresholding Algorithm on FPGA

Barros

Dias

Fernandes

2021

Sensors

Self Cite

View full text Add to dashboard Cite

This work proposes a high-throughput implementation of the Otsu automatic image thresholding algorithm on Field Programmable Gate Array (FPGA), aiming to process high-resolution images in real-time. The Otsu method is a widely used global thresholding algorithm to define an optimal threshold between two classes. However, this technique has a high computational cost, making it difficult to use in real-time applications. Thus, this paper proposes a hardware design exploiting parallelization to optimize the system’s processing time. The implementation details and an analysis of the synthesis results concerning the hardware area occupation, throughput, and dynamic power consumption, are presented. Results have shown that the proposed hardware achieved a high speedup compared to similar works in the literature.

show abstract

Section: Introductionmentioning

confidence: 99%

Fully Parallel Implementation of Otsu Automatic Image Thresholding Algorithm on FPGA

Barros

Dias

Fernandes

2021

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the path to optimal implementation of DNN topologies on FPGAs remains complex, requiring expertise in several areas, DL algorithms and topologies, embedded and reconfigurable computing. Custom design can produce the best performance solutions, but it is an optimization that takes time and lacks flexibility [29], [35]. In this context, tools are available but mostly oriented towards mainframe applications, such as Intel Open Vino for Arria 10 GX [36] and Vitis-AI cards for Alveo or UltraScale available in collaborative environments such as Amazon Web Services EC2-F1 [37].…”

Section: Introductionmentioning

confidence: 99%

User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

et al. 2021

Self Cite

View full text Add to dashboard Cite

Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS.

show abstract

“…In [8] sparse autoencoder architecture with network architecture of 196 input and output neurons along with 100 hidden neurons is implemented using Verilog HDL. DNN hardware realization using a technique called SSAE was implemented in [9] which used a concept of a systolic array that allows the use of many neurons and various layers. With the help of related works, we propose an HT detection which provides better accuracy at faster processing time.…”

Section: Introductionmentioning

confidence: 99%

FPGA Realization of Deep Neural Network for Hardware Trojan Detection

Reddy¹,

M²

2020

IJET

View full text Add to dashboard Cite

With the increase in outsourcing design and fabrication, malicious third-party vendors often insert hardware Trojan (HT) in the integrated Circuits(IC). It is difficult to identify these Trojans since the nature and characteristics of each Trojan differ significantly. Any method developed for HT detection is limited by its capacity on dealing with varied types of Trojans. The main purpose of this study is to show using deep learning (DL), this problem can be dealt with some extent and the effect of deep neural network (DNN) when it is realized on field programmable gate array (FPGA). In this paper, we propose a comparison of accuracy in finding faults on ISCAS’85 benchmark circuits between random forest classifier and DNN. Further for the faster processing time and less power consumption, the network is implemented on FPGA. The results show the performance of deep neural network gets better when a large number of nets are used and faster in the execution of the algorithm. Also, the speedup of the neuron is 100x times better when implemented on FPGA with 15.32% of resource utilization and provides less power consumption than GPU.

show abstract

Deep Neural Network Hardware Implementation Based on Stacked Sparse Autoencoder

Cited by 37 publications

References 20 publications

Fully Parallel Implementation of Otsu Automatic Image Thresholding Algorithm on FPGA

Fully Parallel Implementation of Otsu Automatic Image Thresholding Algorithm on FPGA

User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

FPGA Realization of Deep Neural Network for Hardware Trojan Detection

Contact Info

Product

Resources

About