Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

Lee, Yunsup; Avizienis, Rimas; Bishara, Alex; Xia, Richard; Lockhart, Derek; Batten, Christopher; Asanović, Krste

doi:10.1145/2518037.2491464

Cited by 18 publications

(14 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[22,21] propose vector thread architectures, a hybrid of SIMD and SIMT (Single Instruction Multiple Thread) that are designed specifically to improve parallel loops with irregular data access and control flow. Qualcomm Hexagon [7] is a VLIW DSP with hardware multi-threading and SIMD functional units is optimized for mobile heterogeneous computing.…”

Section: Related Workmentioning

confidence: 99%

“…Recent industry and academic efforts have focused on processor customization as a solution to improve performance and energy efficiency, also taking advantage of rising transistor count. It has been well established that customizing processor data paths and data storage elements to suit the data flow of specific applications, which subsequently reduces overheads due to instruction fetching and decoding, can lead to improved performance and energy efficiencies [16,27,5,15,8,14,22]. Most of these heterogeneous architectures work based on the principle of executing sequential code on a general purpose core and offloading computation with data-level parallelism onto specialized energy efficient functional units.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring architectural heterogeneity in intelligent vision systems

Chandramoorthy

Tagliavini

Irick

et al. 2015

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

Limited power budgets and the need for high performance computing have led to platform customization with a number of accelerators integrated with CMPs. In order to study customized architectures, we model four customization design points and compare their performance and energy across a number of computer vision workloads. We analyze the limitations of generic architectures and quantify the costs of increasing customization using these micro-architectural design points. This analysis leads us to develop a framework consisting of low-power multi-cores and an array of configurable micro-accelerator functional units. Using this platform, we illustrate data flow and control processing optimizations that provide for performance gains similar to custom ASICs for a wide range of vision benchmarks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Exploring architectural heterogeneity in intelligent vision systems

Chandramoorthy

Tagliavini

Irick

et al. 2015

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

show abstract

“…Such compute-intensive workloads are often targeted by single instruction multiple data (SIMD) architectures [73], [33], [53], [52], [49] to exploit the data parallelism that is often inherent in these applications. Accelerator-rich platforms [55], [83], [26], [14], in particular, are well-suited for targeting these workloads.…”

Section: Introductionmentioning

confidence: 99%

BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing

Grigorian¹,

Farahpour²,

Reinman³

2015

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

Applications with large amounts of data, real-time constraints, ultra-low power requirements, and heavy computational complexity present significant challenges for modern computing systems, and often fall within the category of high performance computing (HPC). As such, computer architects have looked to high performance single instruction multiple data (SIMD) architectures, such as accelerator-rich platforms, for handling these workloads. However, since the results of these applications do not always require exact precision, approximate computing may also be leveraged. In this work, we introduce BRAINIAC, a heterogeneous platform that combines precise accelerators with neural-network-based approximate accelerators. These reconfigurable accelerators are leveraged in a multi-stage flow that begins with simple approximations and resorts to more complex ones as needed. We employ high-level, applicationspecific light-weight checks (LWCs) to throttle this multi-stage acceleration flow and reliably ensure user-specified accuracy at runtime. Evaluation of the performance and energy of our heterogeneous platform for error tolerance thresholds of 5%-25% demonstrates an average of 3× gain over computation that only includes precise acceleration, and 15×-35× gain over softwarebased computation.

show abstract

“…The VIPERS soft VP is a general-purpose accelerator that can achieve a 44Â speedup compared to the Nios II scalar processor [7]; it increase A major challenge with these VPs is slow memory accesses. Comprehensive explorations of MIMD, vector SIMD and vector thread architectures in handling regular and irregular DLP efficiently confirm that vector-based microarchitectures are more area and energy efficient compared to their scalar counterparts even for irregular DLP [14]. Lo et al [15] introduced an improved SIMD architecture targeted at video processing.…”

Section: Introductionmentioning

confidence: 99%