In recent years, research in the space community has shown a growing interest in Artificial Intelligence (AI), mostly driven by systems miniaturization and commercial competition. In particular, the application of Deep Learning (DL) techniques on board Earth Observation (EO) satellites might lead to numerous advantages in terms of mitigation of downlink bandwidth constraints, costs, and increment of the satellite autonomy. In this framework, the CloudScout project, funded by the European Space Agency (ESA), represents the first time in-orbit demonstration of a Convolutional Neural Network (CNN) applied to hyperspectral images for cloud detection. The first instance of this use case has been done with an INTEL Myriad 2 VPU on board a CubeSat optimized for low cost, size, and power efficiency. Nevertheless, this solution introduces multiple drawbacks due to its design not specifically being for the space environment, thus limiting its applicability to short-lifetime Low Earth Orbit (LEO) applications. The current work provides a benchmark between the Myriad 2 and our custom hardware accelerator designed for Field Programmable Gate Arrays (FPGAs). The metrics used for comparison include inference time, power consumption, space qualification, and components. The obtained results show that the FPGA-based solution is characterized by a reduced inference time, and a higher possibility of customization, but at the cost of greater power consumption and a longer Time to Market. As a conclusion, the proposed approach might extend the potential market of DL-based solutions to long-term LEO or interplanetary exploration missions through deployment on space-qualified FPGAs, with a limited cost in energy efficiency.
Recurrent Neural Networks (RNNs) have become important tools for tasks such as speech recognition, text generation, or natural language processing. However, their inference may involve up to billions of operations and their large number of parameters leads to large storage size and runtime memory usage. These reasons impede the adoption of these models in real-time, on-the-edge applications. Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) have emerged as promising solutions for the hardware acceleration of these algorithms, thanks to their degree of customization of compute data paths and memory subsystems, which makes them take the maximum advantage from compression techniques for what concerns area, timing, and power consumption. In contrast to the extensive study in compression and quantization for plain feed forward neural networks in the literature, little attention has been paid to reducing the computational resource requirements of RNNs. This work proposes a new effective methodology for the post-training quantization of RNNs. In particular, we focus on the quantization of Long Short-Term Memory (LSTM) RNNs and Gated Recurrent Unit (GRU) RNNs. The proposed quantization strategy is meant to be a detailed guideline toward the design of custom hardware accelerators for LSTM/GRU-based algorithms to be implemented on FPGA or ASIC devices using fixed-point arithmetic only. We applied our methods to LSTM/GRU models pretrained on the IMDb sentiment classification dataset and Penn TreeBank language modelling dataset, thus comparing each quantized model to its floating-point counterpart. The results show the possibility to achieve up to 90% memory footprint reduction in both cases, obtaining less than 1% loss in accuracy and even a slight improvement in the Perplexity per word metric, respectively. The results are presented showing the various trade-offs between memory footprint reduction and accuracy changes, demonstrating the benefits of the proposed methodology even in comparison with other works from the literature.
In recent years, Convolutional Neural Networks (CNNs) have demonstrated outstanding results in several emerging classification tasks. The high-quality predictions are often achieved with computationally intensive workloads that hinder the hardware acceleration of these models at the edge. Field Programmable Gate Arrays (FPGAs) have proven to be energy efficient platforms for the execution of these algorithms and works proposing methods for automating the design on these devices have acquired relevance. The common purpose is to enable a wide range of users without specific skills to accelerate CNNs on FPGAs with reduced development times. In this paper, we present FPG-AI, a technology-independent toolflow for automating the deployment of CNNs on FPGA. The framework combines the use of model compression strategies with a fully handcrafted Hardware Description Languages (HDL)-based accelerator that poses no limit on device portability. On top of that, an automation process merges the two design spaces to define an end-to-end and ready-to-use tool. Experimental results are reported for reference models extracted from the literature (LeNet, NiN, VGG16, MobileNet-V1) on multiple classification datasets (MNIST, CIFAR10, ImageNet). To prove the technology independence of FPG-AI, we characterize the toolflow on devices with heterogeneous resource budgets belonging to different vendors (Xilinx, Intel, and Microsemi). Comparison with state-of-the-art work confirms the unmatched device portability of FPG-AI and shows performance metrics in line with the literature.
In recent years, FPGAs have demonstrated remarkable performance and contained power consumption for the on-the-edge inference of Convolutional Neural Networks. One of the main challenges in implementing this class of algorithms on board an FPGA is resource management, especially with regard to memory. This work presents a multi-cache system that allows for noticeably shrinking the required on-chip memory with a negligible variation of timing performance and power consumption. The presented methods have been applied to the CloudScout CNN, which was developed to perform cloud detection directly on board the satellite, thus representing a relevant case study for on the edge applications. The system was validated and characterized on a Xilinx ZCU106 Evaluation Board. The result is a 64.48% memory saving if compared to an alternative hardware accelerator developed for the same algorithm, with comparable performance in terms of inference time and power consumption. The paper also presents a detailed analysis of the hardware accelerator power consumption, focusing on the impact of data transfer between the accelerator and the external memory. Further investigation shows that the proposed strategies allow the implementation of the accelerator on FPGAs with a smaller size, guaranteeing benefits in terms of power consumption and hardware costs. A broader evaluation about the applicability of the presented methods to other models demonstrates valuable results in terms of memory saving with respect to other works reported in the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.