Energy-efficient vision on the PULP platform for ultra-low power parallel computing

Conti, Francesco; Rossi, Davide; Pullini, Antonio; Loi, Igor; Benini, Luca

doi:10.1109/sips.2014.6986099

Cited by 18 publications

(11 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the flow we have shown is specific to P2012, we argue that the underlying heterogeneity approach and the proposed methodology are applicable to the entire class of clustered many-cores (e.g. Kalray MPPA [24] and PULP [14]). …”

Section: Discussionmentioning

confidence: 99%

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Conti

Marongiu

Pilkington

et al. 2015

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

The end of Dennardian scaling in advanced technologies brought about new architectural templates to overcome the so-called utilization wall and provide Moore's Law-like performance and energy scaling in embedded SoCs. One of the most promising templates, architectural heterogeneity, is hindered by high cost due to the design space explosion and the lack of effective exploration tools. Our work provides three contributions towards a scalable and effective methodology for design space exploration in embedded MC-SoCs. First, we present the He-P2012 architecture, augmenting the state-of-art STMicroelectronics P2012 platform with heterogeneous shared-L1 coprocessors called HW processing elements (HWPE). Second, we propose a novel methodology for the semi-automatic definition and instantiation of shared-memory HWPEs from a C source, supporting both simple and structured data types. Third, we demonstrate that the integration of HWPEs can provide significant performance and energy efficiency benefits on a set of benchmarks originally developed for the homogeneous P2012, achieving up to 123x speedup on the accelerated code region (∼98 % of Amdahl's law limit) while saving 2/3 of the energy.

show abstract

Section: Discussionmentioning

confidence: 99%

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Conti

Marongiu

Pilkington

et al. 2015

J Sign Process Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…3. a chip has been realized based on the PULP architecture [8] with 4 Or10n cores, 16 kB of L2 memory, 16 kB of tightly coupled data memory (TCDM) organized into 8 banks and 4 kB of instruction cache. Each core has a dedicated FPU capable of additions, subtractions and multiplications with 2 cycles of latency.…”

Section: A Chip Architecturementioning

confidence: 99%

Approximate 32-bit floating-point unit design with 53% power-area product reduction

Camus

Schlachter

Enz

et al. 2016

ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference

View full text Add to dashboard Cite

Abstract-The floating-point unit is one of the most common building block in any computing system and is used for a huge number of applications. By combining two state-of-the-art techniques of imprecise hardware, namely Gate-Level Pruning and Inexact Speculative Adder, and by introducing a novel Inexact Speculative Multiplier architecture, three different approximate FPUs and one reference IEEE-754 compliant FPU have been integrated in a 65 nm CMOS process within a low-power multi-core processor. Silicon measurements show up to 27 % power, 36 % area and 53 % power-area product savings compared to the IEEE-754 single-precision FPU. Accuracy loss has been evaluated with a high-dynamic-range image tone-mapping algorithm, resulting in small but non-visible errors with image PSNR value of 90 dB. I. INTRODUCTIONWith the forecasted end of Moore's law and the increasing complexity to design and fabricate integrated circuits, power and reliability have become the main challenges to technology scaling. Power has definitely emerged as a critical issue due to the poor scaling of V DD and V th , while transistor miniaturization reaching the nanoscopic scale has led to extreme Process-Voltage-Temperature (PVT) variations. Unfortunately, achieving low power and robustness against PVT variations requires complicated and conflicting design constraints. As a consequence, designers are being pushed to seek for new energy-efficient circuit design and computing techniques to meet the exploding demand of data processing from mobile devices and cloud services.Approximate computing [1, 2] has emerged as a promising solution to sustain computing advancement and overcome the limitations in technology scaling. This approach explores a new trade-off between energy or circuit costs versus application accuracy. A myriad of applications could tolerate trading off a little bit of accuracy without compromising their functionality or user experience. In multimedia applications for instance, a small proportion of errors remains imperceptible to humans.To design approximate systems, several approaches have been investigated at different hardware levels, such as voltagefrequency over-scaling [3] at physical level or significancebased memory protection [4] at algorithmic level. At circuit level, an interesting approach is to perform computations using approximate arithmetic operators, such as adders and multipliers, allowing a controlled and limited amount of errors against significant power saving or performance increase. This paper focuses on two of these techniques: Gate-level Pruning [5] and Inexact Speculative Adder [6], which have both demonstrated significant savings simultaneously in energy, delay and area at the cost of reasonable errors.

show abstract

“…The multi-cluster design is a common solution applied to overcome scalability limitations in modern manycore accelerators, such as STM STHORM [2], Plurality HAL [8], KALRAY MMPA [4] and PULP [3].…”

Section: Architectural Templatementioning

confidence: 99%

“…Architectural heterogeneity is an effective design paradigm to build energy-efficient embedded vision systems. A common platform relies on system-on-chip (SoC) integration of a host processor and one or more programmable manycore accelerators (MCA) [2] [8] [4] [3]. MCAs provide tens to hundreds of small processing units, connected to a shared on-chip memory via a low-latency, high-throughput Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.…”

Section: Introductionmentioning

confidence: 99%

A framework for optimizing OpenVX applications performance on embedded manycore accelerators

Tagliavini

Haugou

Marongiu

et al. 2015

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems

Self Cite

View full text Add to dashboard Cite

Nowadays Computer Vision application are ubiquitous, and their presence on embedded devices is more and more widespread. Heterogeneous embedded systems featuring a clustered manycore accelerator are a very promising target to execute embedded vision algorithms, but the code optimization for these platforms is a challenging task. Moreover, designers really need support tools that are both fast and accurate. In this work we introduce ADRENALINE, an environment for development and optimization of OpenVX applications targeting manycore accelerators. ADRENALINE consists of a custom OpenVX run-time and a virtual platform, and overall it is intended to provide support to enhance performance of embedded vision applications.

show abstract

Energy-efficient vision on the PULP platform for ultra-low power parallel computing

Cited by 18 publications

References 33 publications

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Approximate 32-bit floating-point unit design with 53% power-area product reduction

A framework for optimizing OpenVX applications performance on embedded manycore accelerators

Contact Info

Product

Resources

About