Lukas Cavigelli scite author profile

Abstract-An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution. Hardware acceleration of these algorithms is essential to adopt these improvements in embedded and mobile computer vision systems. We present a new architecture, design and implementation as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power-, area-and I/Oefficiency. The manufactured device provides up to 196 GOp/s on 3.09 mm 2 of silicon in UMC 65 nm technology and can achieve a power efficiency of 803 GOp/s/W. The massively reduced bandwidth requirements make it the first architecture scalable to TOp/s performance.

show abstract

YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration

Andri

Cavigelli

Rossi

et al. 2018

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

197

138

View full text Add to dashboard Cite

Abstract-Convolutional Neural Networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1.5 TOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 1.9 mm 2 and with a power dissipation of 895 µW in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/W@0.6 V and 1.1 TOp/s/MGE@1.2 V, respectively.

show abstract

CAS-CNN: A deep convolutional neural network for image compression artifact suppression

2017

View full text Add to dashboard Cite

Abstract-Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks.We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image-a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.

show abstract

Accelerating real-time embedded scene labeling with convolutional networks

Cavigelli

Magno

Benini

2015

View full text Add to dashboard Cite

Today there is a clear trend towards deploying advanced computer vision (CV) systems in a growing number of application scenarios with strong real-time and power constraints. Brain-inspired algorithms capable of achieving recordbreaking results combined with embedded vision systems are the best candidate for the future of CV and video systems due to their flexibility and high accuracy in the area of image understanding. In this paper, we present an optimized convolutional network implementation suitable for real-time scene labeling on embedded platforms. We show that our algorithm can achieve up to 96 GOp/s, running on the Nvidia Tegra K1 embedded SoC. We present experimental results, compare them to the state-of-the-art, and demonstrate that for scene labeling our approach achieves a 1.5x improvement in throughput when compared to a modern desktop CPU at a power budget of only 11 W.

show abstract

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

et al. 2016

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over the last few years, pushing the computer vision close beyond human accuracy. The required computational effort of CNNs today requires powerhungry parallel processors and GP-GPUs. Recent efforts in designing CNN Application-Specific Integrated Circuits (ASICs) and accelerators for System-On-Chip (SoC) integration have achieved very promising results. Unfortunately, even these highly optimized engines are still above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. On the algorithmic side, highly competitive classification accuracy can be achieved by properly training CNNs with binary weights. This novel algorithm approach brings major optimization opportunities in the arithmetic core by removing the need for the expensive multiplications as well as in the weight storage and I/O costs. In this work, we present a HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V. Our accelerator outperforms state-of-the-art performance in terms of ASIC energy efficiency as well as area efficiency with 61.2 TOp/s/W and 1135 GOp/s/MGE, respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lukas Cavigelli

Origami: A 803-GOp/s/W Convolutional Network Accelerator

YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration

CAS-CNN: A deep convolutional neural network for image compression artifact suppression

Accelerating real-time embedded scene labeling with convolutional networks

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Contact Info

Product

Resources

About