Roel Uytterhoeven scite author profile

Roel Uytterhoeven

Sign up to set email alerts

|

5Publications

235Citation Statements Received

88Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI

¹

,

²

,

³

et al. 2017

View full text Add to dashboard Cite

ConvNets, or Convolutional Neural Networks (CNN), are state-of-the-art classification algorithms, achieving near-human performance in visual recognition [1]. New trends such as augmented reality demand always-on visual processing in wearable devices. Yet, advanced ConvNets achieving high recognition rates are too expensive in terms of energy as they require substantial data movement and billions of convolution computations. Today, state-of-the-art mobile GPU's and ConvNet accelerator ASICs [2][3] only demonstrate energy-efficiencies of 10's to several 100's GOPS/W, which is one order of magnitude below requirements for always-on applications. This paper introduces the concept of hierarchical recognition processing, combined with the Envision platform: an energy-scalable ConvNet processor achieving efficiencies up to 10TOPS/W, while maintaining recognition rate and throughput. Envision hereby enables always-on visual recognition in wearable devices. Figure 14.5.1 demonstrates the concept of hierarchical recognition. Here, a hierarchy of increasingly complex individually trained ConvNets, with different topologies, different network sizes and increasing computational precision requirements, is used in the context of person identification. This enables constant scanning for faces at very low average energy cost, yet rapidly scales up to more complex networks detecting a specific face such as a device's owner, all the way up to full VGG-16-based 5760-face recognition. The opportunities afforded by such a hierarchical approach span far beyond face recognition alone, but can only be exploited by digital systems demonstrating wide-range energy scalability across computational precision. State-of-the-art ASICs in references [3] and [4] only show 1.5× and 8.2× energy-efficiency scalability, respectively. Envision improves upon this by introducing subword-parallel Dynamic-Voltage-AccuracyFrequency Scaling (DVAFS), a circuit-level technique enabling 40× energy-precision scalability at constant throughput. Figure 14.5.2 illustrates the basic principle of DVAFS and compares it to Dynamic-Accuracy Scaling (DAS) and Dynamic-Voltage-Accuracy Scaling (DVAS) [4]. In DAS, switching activity and hence energy consumption is reduced for low precision computations by rounding and masking a configurable number of LSB's at the inputs of multiplyaccumulate (MAC) units. DVAS exploits shorter critical paths in DAS's reduced-precision modes by combining it with voltage scaling for increased energy scalability. This paper proposes subword-parallel DVAFS, which further improves upon DVAS, by reusing inactive arithmetic cells at reduced precision. These can be reconfigured to compute 2×1-8b or 4×1-4b (N×1-16b/N, with N the level of subword-parallelism), rather than 1×1-16b words per cycle, when operating at less than 8b precision. At constant data throughput, this permits lowering the processor's frequency and voltage significantly below DVAS values. As a result, DVAFS is a dynamic precision technique which simultaneously lowers all run-time a...

DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling

¹

,

²

,

³

et al. 2017

View full text Add to dashboard Cite

Several applications in machine learning and machine-to-human interactions tolerate small deviations in their computations. Digital systems can exploit this fault-tolerance to increase their energy-efficiency, which is crucial in embedded applications. Hence, this paper introduces a new means of Approximate Computing: Dynamic-Voltage-Accuracy-Frequency-Scaling (DVAFS), a circuit-level technique enabling a dynamic trade-off of energy versus computational accuracy that outperforms other Approximate Computing techniques. The usage and applicability of DVAFS is illustrated in the context of Deep Neural Networks, the current state-of-the-art in advanced recognition. These networks are typically executed on CPU's or GPU's due to their high computational complexity, making their deployment on batteryconstrained platforms only possible through wireless connections with the cloud. This work shows how deep learning can be brought to IoT devices by running every layer of the network at its optimal computational accuracy. Finally, we demonstrate a DVAFS processor for Convolutional Neural Networks, achieving efficiencies of multiple TOPS/W.

Completion Detection-Based Timing Error Detection and Correction in a Near-Threshold RISC-V Microprocessor in FDSOI 28 nm

¹

,

²

2020

IEEE Solid-State Circuits Lett.

View full text Add to dashboard Cite

Design Margin Reduction Through Completion Detection in a 28-nm Near-Threshold DSP Processor

¹

,

²

2022

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

This paper presents a timing error detection and correction (EDaC) technique optimized for near/sub-threshold operation to recover energy lost in the conventional signoff margins. The presented EDaC requires no modifications to the processor pipeline and avoids imposing additional holdconstraints on monitored paths by instantaneously checking for late activity. Further, two correction methods are discussed, a simple clock gating method and a low cycle overhead clock stretching method. Both provide robust last-minute error prevention. The EDaC is applied in a near/sub-threshold implementation of the CoolFlux DSP processor and infers only a 2.8 % and 2.1 % area overhead for the detection and correction respectively. Silicon measurements validate the EDaC system from 0.25 V to 0.7 V (1 MHz to 200 MHz) and show that it recovers all voltage margins in the near/sub-threshold region. The design achieves a MEP of 8.1 pJ/cycle at 0.34 V and 10 MHz. Here, the EDaC technique reduces the energy consumption by 48 % to 17.6 % with respect to the signoff margins depending on their conservatism, and it enables the processor to operate with only 12 % energy overhead compared to its ideal non-margined critical operation point.

A sub 10 pJ/Cycle Over a 2 to 200 MHz Performance Range RISC- V Microprocessor in 28 nm FDSOI

¹

,

²

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.