Neural Acceleration for General-Purpose Approximate Programs

Esmaeilzadeh, Hadi; Sampson, Adrian; Ceze, Luís; Burger, Doug

doi:10.1109/mm.2013.28

Cited by 68 publications

(77 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We perform rigorous validation of Aladdin against handwritten RTL implementations and a commercial HLS design flow. We show that Aladdin can model the behavior of recently published accelerators [38,55,19] and typical accelerator kernels [17] (Section 4). 3.…”

Section: Contributionsmentioning

confidence: 98%

“…Hardware acceleration exists in many forms, such as analog accelerators [6,50], static [13,19,28,38,43,52,55] and dynamic datapath accelerators [14,25,27], and programmable accelerators, such as GPUs and DSPs. In this work, we focus on static datapath accelerators.…”

Section: Background and Motivationmentioning

confidence: 99%

“…Starting with an unconstrained program DDDG, which corresponds to an initial representation of accelerator hardware, Aladdin applies optimizations as well as constraints to the graph to create a realistic model of accelerator activity. We rigorously validated Aladdin against RTL implementations of accelerators from both handwritten Verilog and a commercial HLS tool for a range of applications, including accelerators in Memcached [38], HARP [55], NPU [19], and a commonly used throughput-oriented benchmark suite, SHOC [17]. Our results show that Aladdin can model performance within 0.9%, power within 4.9%, and area within 6.6% compared to accelerator designs generated by traditional RTL flows.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures

Shao¹,

Reagen²,

Wei³

et al. 2014

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

120

117

View full text Add to dashboard Cite

Hardware specialization, in the form of accelerators that provide custom datapath and control for specific algorithms and applications, promises impressive performance and energy advantages compared to traditional architectures. Current research in accelerator analysis relies on RTL-based synthesis flows to produce accurate timing, power, and area estimates. Such techniques not only require significant effort and expertise but are also slow and tedious to use, making large design space exploration infeasible. To overcome this problem, we present Aladdin, a pre-RTL, power-performance accelerator modeling framework and demonstrate its application to system-on-chip (SoC) simulation. Aladdin estimates performance, power, and area of accelerators within 0.9%, 4.9%, and 6.6% with respect to RTL implementations. Integrated with architecture-level core and memory hierarchy simulators, Aladdin provides researchers an approach to model the power and performance of accelerators in an SoC environment.

show abstract

Section: Contributionsmentioning

confidence: 98%

Section: Background and Motivationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures

Shao¹,

Reagen²,

Wei³

et al. 2014

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

120

117

View full text Add to dashboard Cite

show abstract

“…Like most of the standalone accelerators [15][16][17], the accelerator in the proposed system-called reconfigurable application specified processor (RASP), is a loosely coupled configurable engine attached to the AXI bus. Data communication between on-chip memory and external storage, i.e., DDR3 is provided through direct memory access (DMA) in the RASP.…”

Section: System Architecture Overviewmentioning

confidence: 99%

Design and Application Space Exploration of a Domain-Specific Accelerator System

Feng

Wang

et al. 2018

Electronics

View full text Add to dashboard Cite

Domain-specific accelerators are a reaction adapting to device scaling and the dark silicon era. This paper describes a radar signal processing oriented configurable accelerator and the application space exploration of the system. The system is built around accelerator engines and general-purpose processors (GPPs) that make it suitable for intensive computing kernel acceleration and complex control tasks. It is geared toward high-performance radar digital signal processing; we characterize the applications and find that each of them contains a series of serializable kernels. Taking advantage of this discovery, we design an algorithm pool that shares the same computation resource and memory resource, and each algorithm is size reconfigurable. On the other hand, shared on-chip addressable scratchpad memory eliminates unnecessary explicit data copy between accelerators. Performance of the system is evaluated from measurements performed both on an FPGA SoC test chip and on a prototype chip fabricated by CMOS 40 nm technology. The experimental results show that for different algorithms, the proposed system achieves 1.9× to 10.1× performance gain compared with a state-of-the-art TI DSP chip. In order to characterize the application of the system, a complex real-life task is adopted, and the results show that it can obtain high throughput and desirable precision.

show abstract

“…Examples include neural network accelerators for signal processing [3], digital approximate computing accelerators that leverage neural network algorithms [4], and heterogeneous systems built with GPUs and CPUs for deep learning accelerations [5]. However, traditional CMOS technology has scaling limitations for neuromorphic system design, as many transistors are usually required to build one neuron [5].…”

Section: Introductionmentioning

confidence: 99%

Hardware acceleration for neuromorphic computing: An evolving view

Liu

et al. 2015

2015 15th Non-Volatile Memory Technology Symposium (NVMTS)

View full text Add to dashboard Cite

Abstract-The rapid growth of computing capacity of modern microprocessors enables the wide adoption of machine learning and neural network models. The ever-increasing demand for performance, combining with the concern on power budget, motivated the recent research on hardware acceleration for these learning algorithms. A wide spectrum of hardware platforms have been extensively studied, from conventional heterogeneous computing systems to emerging nanoscale systems. In this paper, we will review the ongoing efforts at Evolutionary Intelligence Laboratory (www.ei-lab.org) about hardware acceleration for neuromorphic computing and ma-chine learning. Realizations on various platforms such as FPGA, on-chip heterogeneous processors, and memristor-based ASIC designs will be explored. An evolving view of the accelerator de-signs for learning algorithms will be also presented.

show abstract

Neural Acceleration for General-Purpose Approximate Programs

Cited by 68 publications

References 28 publications

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures

Design and Application Space Exploration of a Domain-Specific Accelerator System

Hardware acceleration for neuromorphic computing: An evolving view

Contact Info

Product

Resources

About