Maurizio Capra scite author profile

Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications. Their ability to go beyond human precision has made these networks a milestone in the history of AI. However, while on the one hand they present cutting edge performance, on the other hand they require enormous computing power. For this reason, numerous optimization techniques at the hardware and software level, and specialized architectures, have been developed to process these models with high performance and power/energy efficiency without affecting their accuracy. In the past, multiple surveys have been reported to provide an overview of different architectures and optimization techniques for efficient execution of Deep Learning (DL) algorithms. This work aims at providing an up-to-date survey, especially covering the prominent works from the last 3 years of the hardware architectures research for DNNs. In this paper, the reader will first understand what a hardware accelerator is, and what are its main components, followed by the latest techniques in the field of dataflow, reconfigurability, variable bit-width, and sparsity.

show abstract

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Capra

et al. 2020

View full text Add to dashboard Cite

Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute-and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them.

show abstract

Edge Computing: A Survey On the Hardware Requirements in the Internet of Things World

et al. 2019

View full text Add to dashboard Cite

In today’s world, ruled by a great amount of data and mobile devices, cloud-based systems are spreading all over. Such phenomenon increases the number of connected devices, broadcast bandwidth, and information exchange. These fine-grained interconnected systems, which enable the Internet connectivity for an extremely large number of facilities (far beyond the current number of devices) go by the name of Internet of Things (IoT). In this scenario, mobile devices have an operating time which is proportional to the battery capacity, the number of operations performed per cycle and the amount of exchanged data. Since the transmission of data to a central cloud represents a very energy-hungry operation, new computational paradigms have been implemented. The computation is not completely performed in the cloud, distributing the power load among the nodes of the system, and data are compressed to reduce the transmitted power requirements. In the edge-computing paradigm, part of the computational power is moved toward data collection sources, and, only after a first elaboration, collected data are sent to the central cloud server. Indeed, the “edge” term refers to the extremities of systems represented by IoT devices. This survey paper presents the hardware architectures of typical IoT devices and sums up many of the low power techniques which make them appealing for a large scale of applications. An overview of the newest research topics is discussed, besides a final example of a complete functioning system, embedding all the introduced features.

show abstract

Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis

Peloso

Capra

Sole

et al. 2020

Sensors

View full text Add to dashboard Cite

In the last years, the need for new efficient video compression methods grown rapidly as frame resolution has increased dramatically. The Joint Collaborative Team on Video Coding (JCT-VC) effort produced in 2013 the H.265/High Efficiency Video Coding (HEVC) standard, which represents the state of the art in video coding standards. Nevertheless, in the last years, new algorithms and techniques to improve coding efficiency have been proposed. One promising approach relies on embedding direction capabilities into the transform stage. Recently, the Steerable Discrete Cosine Transform (SDCT) has been proposed to exploit directional DCT using a basis having different orientation angles. The SDCT leads to a sparser representation, which translates to improved coding efficiency. Preliminary results show that the SDCT can be embedded into the HEVC standard, providing better compression ratios. This paper presents a hardware architecture for the SDCT, which is able to work at a frequency of 188 M Hz , reaching a throughput of 3.00 GSample/s. In particular, this architecture supports 8k UltraHigh Definition (UHD) (7680 × 4320) with a frame rate of 60 Hz , which is one of the best resolutions supported by HEVC.

show abstract

VLSI Architectures for the Steerable-Discrete-Cosine-Transform (SDCT)

Sole

Peloso

Capra

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maurizio Capra

An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Edge Computing: A Survey On the Hardware Requirements in the Internet of Things World

Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis

VLSI Architectures for the Steerable-Discrete-Cosine-Transform (SDCT)

Contact Info

Product

Resources

About