Cool Mega-Array: A highly energy efficient reconfigurable accelerator

Ozaki, Nobuyuki; Yoshihiro, Yasuda; Saito, Yoshiki; Ikebuchi, Daisuke; Kimura, Masayuki; Amano, Hideharu; Nakamura, Hiroshi; Usami, Kimiyoshi; Namiki, Mitaro; Kondo, Masaaki

doi:10.1109/fpt.2011.6132668

Cited by 14 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared with a 41.5-GOPS/ 0.775W (54.8-MOPS/mW) dynamically reconfigurable accelerator [7] and a 3.2-GOPS/50-mW VLIW accelerator (64-MOPS/mW) [4], the CMA-1 has better energy efficiency. Detail analysis of performance and energy consumption of CMA-1 is shown in [1].…”

Section: E Evaluation Summarymentioning

confidence: 99%

The realtime image processing demonstration with CMA-1: An ultra low-power reconfigurable accelerator

Hironaka

Ozaki

Amano

2011

2011 International Conference on Field-Programmable Technology

View full text Add to dashboard Cite

CMA (Cool Mega Array) is an ultra low-power reconfigurable accelerator with a large PE (Processing Element) array consisting of combinational circuits. Although the configuration is static during execution, various types of application can be implemented by using the versatile data manupilation instructions of the attached micro-controller. By using a real CMA-1 chip, we will demonstrate that CMA-1 can process various image processing with extremely low power only required for the computation.

show abstract

Section: E Evaluation Summarymentioning

confidence: 99%

The realtime image processing demonstration with CMA-1: An ultra low-power reconfigurable accelerator

Hironaka

Ozaki

Amano

2011

2011 International Conference on Field-Programmable Technology

View full text Add to dashboard Cite

show abstract

“…For example, in ADRES [2] the power overhead of the VLIW processor used to handle the data memory access is up to 20%. In CMA [39] the host CPU feeds the data into the PEs through a shared fetch register (FR) file. This is very inefficient in terms of flexibility.…”

Section: A Architecturementioning

confidence: 99%

An Energy-Efficient Integrated Programmable Array Accelerator and Compilation Flow for Near-Sensor Ultralow Power Processing

Das

Martin

Rossi

et al. 2019

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

In this paper we give a fresh look to Coarse Grained Reconfigurable Arrays (CGRAs) as ultra-low power accelerators for near-sensor processing. We present a general-purpose Integrated Programmable-Array accelerator (IPA) exploiting a novel architecture, execution model, and compilation flow for application mapping that can handle kernels containing complex control flow, without the significant energy overhead incurred by state of the art predication approaches. To optimize the performance and energy efficiency, we explore the IPA architecture with special focus on shared memory access, with the help of the flexible compilation flow presented in this paper. We achieve a maximum energy gain of 2×, and performance gain of 1.33× and 1.8× compared with state of the art partial and full predication techniques, respectively. The proposed accelerator achieves an average energy efficiency of 1617 MOPS/mW operating at 100MHz, 0.6V in 28nm UTBB FD-SOI technology, over a wide range of near-sensor processing kernels, leading to an improvement up to 18×, with an average of 9.23× (as well as a speed-up up to 20.3×, with an average of 9.7×) compared to a core specialized for ultra-low power near-sensor processing.

show abstract

“…Both of them employ multiple cores with the objective of distributing heterogeneous tasks to each core to save power, not to reach a significant amount of speedup with respect to single-core solutions. Past work on integration of low-power microcontrollers with accelerators has mainly concentrated on coupling with special-purpose computing devices such as specialized DSPs [23] [24], ASICs [5] [25], or reconfigurable computing fabrics [26]. Contrarily to the model we propose, none of these platforms can be considered fully programmable in a general-purpose sense, and no one supports a full offload of code from a host.…”

Section: Related Workmentioning

confidence: 99%

Enabling the Heterogeneous Accelerator Model on Ultra-Low Power Microcontroller Platforms

Conti

Palossi

Marongiu

et al. 2016

Proceedings of the 2016 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE)

View full text Add to dashboard Cite

The stringent power constraints of complex microcontroller based devices (e.g. smart sensors for the IoT) represent an obstacle to the introduction of sophisticated functionality. Programmable accelerators would be extremely beneficial to provide the flexibility and energy efficiency required by fast-evolving IoT applications; however, the integration complexity and sub-10mW power budgets have been considered insurmountable obstacles so far. In this paper we demonstrate the feasibility of coupling a low power microcontroller unit (MCU) with a heterogeneous programmable accelerator for speeding-up computation-intensive algorithms at an ultra-low power (ULP) sub-10mW budget. Specifically, we develop a heterogeneous architecture coupling a Cortex-M series MCU with PULP, a programmable accelerator for ULP parallel computing. Complex functionality is enabled by the support for offloading parallel computational kernels from the MCU to the accelerator using the OpenMP programming model. We prototype this platform using a STM Nucleo board and a PULP FPGA emulator. We show that our methodology can deliver up to 60x gains in performance and energy efficiency on a diverse set of applications, opening the way for a new class of ULP heterogeneous architectures.

show abstract

Cool Mega-Array: A highly energy efficient reconfigurable accelerator

Cited by 14 publications

References 8 publications

The realtime image processing demonstration with CMA-1: An ultra low-power reconfigurable accelerator

The realtime image processing demonstration with CMA-1: An ultra low-power reconfigurable accelerator

An Energy-Efficient Integrated Programmable Array Accelerator and Compilation Flow for Near-Sensor Ultralow Power Processing

Enabling the Heterogeneous Accelerator Model on Ultra-Low Power Microcontroller Platforms

Contact Info

Product

Resources

About