2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors 2013
DOI: 10.1109/asap.2013.6567572
|View full text |Cite
|
Sign up to set email alerts
|

Transforming a linear algebra core to an FFT accelerator

Abstract: Abstract-This paper considers the modifications required to transform a highly-efficient, specialized linear algebra core into an efficient engine for computing Fast Fourier Transforms (FFTs). We review the minimal changes required to support Radix-4 FFT computations and propose extensions to the micro-architecture of the baseline linear algebra core. Along the way, we study the critical differences between the two classes of algorithms. Special attention is paid to the configuration of the on-chip memory syst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 21 publications
(24 reference statements)
0
8
0
Order By: Relevance
“…For example, [9], [10] explore design space tradeoffs for on-chip problem sizes. [3] indeed addresses the memory bandwidth problem but not at a level of detail that includes DRAM row-buffer effects.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, [9], [10] explore design space tradeoffs for on-chip problem sizes. [3] indeed addresses the memory bandwidth problem but not at a level of detail that includes DRAM row-buffer effects.…”
Section: Related Workmentioning
confidence: 99%
“…These include software implementations on CPUs [6], GPUs [7], and supercomputers [8], and hardware implementations based on ASIC [9], [4] or FPGA [1]. Further there are studies on design automation frameworks for FFTs.…”
Section: Related Workmentioning
confidence: 99%
“…Given the high parallelism of the algorithms, it is possible to schedule threads without being influenced by data dependencies [8]. To achieve a least-energy solution, we assume two different clocks for the CFMA and the register file, trading parallelism versus pipelining in the access to the register file [8,16].…”
Section: Functional Unitmentioning
confidence: 99%
“…The FFT can be similarly expressed in terms of CFMAs [14,16]. For instance in the case of a Radix-2 Cooley-Tukey implementation, a butterfly between complex operands requires 10 floating-point operations, and it can be implemented as two CFMAs.…”
Section: Functional Unitmentioning
confidence: 99%
“…Specifically, we explored the mapping of FFTs, which are an important signal processing kernel [110]. While GEMM is a straightforward kernel with simple, predictable data access patterns, the FFT provides more challenges to obtaining high performance.…”
Section: Fast Fourier Transformmentioning
confidence: 99%