An FPGA Implementation of a Convolutional Auto-Encoder

Zhao, Wei; Jia, Zuchen; Wei, Xiaosong; Hai, Wang

doi:10.3390/app8040504

Cited by 10 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The extracted features are global, local features are ignored, and local features are more important for wood texture recognition. Convolutional neural networks have the characteristics of local connection and weight sharing [35][36][37][38][39][40], which can accelerate the training of the network and facilitate the extraction of local features. The deep convolutional autoencoder designed in this paper is shown in Figure 2.…”

Section: Methods Of the Local Feature Descriptor Based On The Convolumentioning

confidence: 99%

Recognition and Grasping of Disorderly Stacked Wood Planks Using a Local Image Patch and Point Pair Feature Method

Liu

Ding

et al. 2020

Sensors

View full text Add to dashboard Cite

Considering the difficult problem of robot recognition and grasping in the scenario of disorderly stacked wooden planks, a recognition and positioning method based on local image features and point pair geometric features is proposed here and we define a local patch point pair feature. First, we used self-developed scanning equipment to collect images of wood boards and a robot to drive a RGB-D camera to collect images of disorderly stacked wooden planks. The image patches cut from these images were input to a convolutional autoencoder to train and obtain a local texture feature descriptor that is robust to changes in perspective. Then, the small image patches around the point pairs of the plank model are extracted, and input into the trained encoder to obtain the feature vector of the image patch, combining the point pair geometric feature information to form a feature description code expressing the characteristics of the plank. After that, the robot drives the RGB-D camera to collect the local image patches of the point pairs in the area to be grasped in the scene of the stacked wooden planks, also obtaining the feature description code of the wooden planks to be grasped. Finally, through the process of point pair feature matching, pose voting and clustering, the pose of the plank to be grasped is determined. The robot grasping experiment here shows that both the recognition rate and grasping success rate of planks are high, reaching 95.3% and 93.8%, respectively. Compared with the traditional point pair feature method (PPF) and other methods, the method present here has obvious advantages and can be applied to stacked wood plank grasping environments.

show abstract

Section: Methods Of the Local Feature Descriptor Based On The Convolumentioning

confidence: 99%

Recognition and Grasping of Disorderly Stacked Wood Planks Using a Local Image Patch and Point Pair Feature Method

Liu

Ding

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…In this Section, we compare our ternary AE with the most related state-of-the-art TNN implementations: TNN models for image classification implemented on customized accelerators in ASIC and FPGA [28], [38] and an AE model with 8-bit precision for image compression [39]. In [28] and [38], the TNNs are optimized for resourceefficiency and performance and use benchmarking datasets such as CIFAR100, SVHN, and GTSRB.…”

Section: Comparison With Existing Workmentioning

confidence: 99%

An Energy Efficient EdgeAI Autoencoder Accelerator for Reinforcement Learning

Manjunath

Shiri

Hosseini

et al. 2021

IEEE Open J. Circuits Syst.

View full text Add to dashboard Cite

In EdgeAI embedded devices that exploit reinforcement learning (RL), it is essential to reduce the number of actions taken by the agent in the real world and minimize the compute-intensive policies learning process. Convolutional autoencoders (AEs) has demonstrated great improvement for speeding up the policy learning time when attached to the RL agent, by compressing the high dimensional input data into a small latent representation for feeding the RL agent. Despite reducing the policy learning time, AE adds a significant computational and memory complexity to the model which contributes to the increase in the total computation and the model size. In this article, we propose a model for speeding up the policy learning process of RL agent with the use of AE neural networks, which engages binary and ternary precision to address the high complexity overhead without deteriorating the policy that an RL agent learns. Binary Neural Networks (BNNs) and Ternary Neural Networks (TNNs) compress weights into 1 and 2 bits representations, which result in significant compression of the model size and memory as well as simplifying multiply-accumulate (MAC) operations. We evaluate the performance of our model in three RL environments including DonkeyCar, Miniworld sidewalk, and Miniworld Object Pickup, which emulate various real-world applications with different levels of complexity. With proper hyperparameter optimization and architecture exploration, TNN models achieve near the same average reward, Peak Signal to Noise Ratio (PSNR) and Mean Squared Error (MSE) performance as the full-precision model while reducing the model size by 10× compared to full-precision and 3× compared to BNNs. However, in BNN models the average reward drops up to 12% -25% compared to the full-precision even after increasing its model size by 4×. We designed and implemented a scalable hardware accelerator which is configurable in terms of the number of processing elements (PEs) and memory data width to achieve the best power, performance, and energy efficiency trade-off for EdgeAI embedded devices. The proposed hardware implemented on Artix-7 FPGA dissipates 250 μJ energy while meeting 30 frames per second (FPS) throughput requirements. The hardware is configurable to reach an efficiency of over 1 TOP/J on FPGA implementation. The proposed hardware accelerator is synthesized and placed-and-routed in 14 nm FinFET ASIC technology which brings down the power dissipation to 3.9 μJ and maximum throughput of 1,250 FPS. Compared to the state of the art TNN implementations on the same target platform, our hardware is 5× and 4.4× (2.2× if technology scaled) more energy efficient on FPGA and ASIC, respectively.

show abstract

“…In our work, we calculate the computational roof and the I/O memory maximum bandwidth roof of the Xilinx ZYNQ 7100 computing platform according to Equation (10).…”

Section: The Roofline Model Of Zynq 7100mentioning

confidence: 99%

“…Because CNN is computationally intensive, it is not suitable for general-purpose processors, such as traditional CPUs. Many researchers have proposed CNN accelerators for implementation in the Field-programmable gate array (FPGA) [10,11], graphics processing unit (GPU) [3], and application-specific integrated circuit 2 of 18 (ASIC) [12]. These accelerators provide an order of magnitude performance improvement and energy advantage over general purpose processors [13].…”

Section: Introductionmentioning

confidence: 99%

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

et al. 2019

View full text Add to dashboard Cite

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs.

show abstract

An FPGA Implementation of a Convolutional Auto-Encoder

Cited by 10 publications

References 25 publications

Recognition and Grasping of Disorderly Stacked Wood Planks Using a Local Image Patch and Point Pair Feature Method

Recognition and Grasping of Disorderly Stacked Wood Planks Using a Local Image Patch and Point Pair Feature Method

An Energy Efficient EdgeAI Autoencoder Accelerator for Reinforcement Learning

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

Contact Info

Product

Resources

About