International Symposium on Low Power Electronics and Design (ISLPED) 2013
DOI: 10.1109/islped.2013.6629329
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying acceleration: Power/performance trade-offs of application kernels in hardware

Abstract: Abstract-As the traditional performance gains of technology scaling diminish, one of the most promising directions is building special purpose fixed function hardware blocks, commonly referred to as accelerators. Accelerators have become prevalent in industrial SoC designs for their low power, high performance potential. In this work we explore thousands of implementations of classical software workloads in hardware. This thorough, detailed design space search of hardware accelerators gives architects a quanti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Moreover, in multicore systems where dozens of cores are involved, it is difficult to find configurations that maximize throughput with a power constraint [22]. In [23] Reagen et al explored the design space for lowpower accelerators used in SoC designs to obtain power and performance trade-off. Qadri et al [10] proposed a scheme called E-FLORE to optimize multicore architectures.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, in multicore systems where dozens of cores are involved, it is difficult to find configurations that maximize throughput with a power constraint [22]. In [23] Reagen et al explored the design space for lowpower accelerators used in SoC designs to obtain power and performance trade-off. Qadri et al [10] proposed a scheme called E-FLORE to optimize multicore architectures.…”
Section: Related Workmentioning
confidence: 99%
“…Specifying more memory banks and more functional units yield better performance with increased power costs [24], but take the data transmission bandwidth restraint into account, additional partitioning becomes wasteful for there is no enough bandwidth to feed data efficiently. In the proposed accelerator, we use six PEs, and four of them have the same structure to complete a four-channel parallel operation and we divide the scratchpad into 32 banks, with the total size of 2 MByte.…”
Section: Algorithm Speedup Ratiomentioning
confidence: 99%
“…Accelerators have been shown to provide orders of magnitude more energy efficiency than general-purpose computing. Despite their narrow domain, the lack of predetermined structures and the ability to consider radically different techniques leads to a large design space [15].…”
Section: Hardware Acceleratorsmentioning
confidence: 99%