2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers 2009
DOI: 10.1109/isscc.2009.4977407
|View full text |Cite
|
Sign up to set email alerts
|

A 300mV 494GOPS/W reconfigurable dual-supply 4-Way SIMD vector processing accelerator in 45nm CMOS

Abstract: Intel, Hillsboro, OR High-throughput parallel SIMD vector computations are the most performance and power-critical operations in multimedia, graphics and signal processing workloads. An array of SIMD vector processing engines delivers highthroughput short bit-width arithmetic operations on large data sets with orders of magnitude higher energy efficiencies vs. general-purpose cores [1,2]. A reconfigurable 4-way SIMD engine targeted for on-die acceleration of vector processing in power-constrained mobile microp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 3 publications
0
11
0
Order By: Relevance
“…Compared with a 41.5-GOPS/ 0.775W (54.8-MOPS/mW) dynamically reconfigurable accelerator [15] and a 3.2-GOPS/50-mW VLIW accelerator(64-MOPS/mW) [13], the CMA-1 has better energy efficiency. Although its energy efficiency is worse compared with that of a 494-GOPS/W SIMD accelerator [14], the reported evaluation result for that accelerator was the peak energy consumption ratio only for the array part when the subthrehold voltage is used, while the sustained performance for actual application programs running on an actual chip is shown here.…”
Section: B Real Chip Measurement Evaluation 1) Delay Modelmentioning
confidence: 77%
“…Compared with a 41.5-GOPS/ 0.775W (54.8-MOPS/mW) dynamically reconfigurable accelerator [15] and a 3.2-GOPS/50-mW VLIW accelerator(64-MOPS/mW) [13], the CMA-1 has better energy efficiency. Although its energy efficiency is worse compared with that of a 494-GOPS/W SIMD accelerator [14], the reported evaluation result for that accelerator was the peak energy consumption ratio only for the array part when the subthrehold voltage is used, while the sustained performance for actual application programs running on an actual chip is shown here.…”
Section: B Real Chip Measurement Evaluation 1) Delay Modelmentioning
confidence: 77%
“…We propose two architectural mechanisms that complement prior work on variation tolerance in parallel architectures [10]. Decoupled parallel SIMD pipelines extends the work on timing speculation to parallel pipelines, providing tolerance of input-dependent and dynamic variations.…”
Section: Synctium Processormentioning
confidence: 99%
“…Additionally, process variations can lead to nonfunctioning pipeline components, which may reduce both yield and performance. Prior approaches addressed this challenge at a coarse granularity, adjusting the supply voltage for entire PEs [10], [17]. Fine-grained approaches, such as voltage interpolation [13], are problematic because of the large overheads involved in providing the numerous supplies necessary for the massive number of low-power computational units.…”
Section: Pipeline Weaving For Addressing Static Timing Uncertaintiesmentioning
confidence: 99%
“…High energy efficiency of operation at low voltages has been established for 65, 45, 32, 22 nm technologies [7,22,28,29]. Kaul et al [27,29] show that as the supply voltage of the transistor is reduced, the energy efficiency increases, and is maximum near the threshold voltage of the transistor.…”
Section: Process Variationmentioning
confidence: 99%