Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000
DOI: 10.1109/ipdps.2000.846054
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic data layouts for cache-conscious factorization of DFT

Abstract: Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimization techniques for computing the DFT rely on either modifying the computation and data access order or exploiting low level platform specific details, while keeping the data layout in memory static. In this paper, we propose a high level optimization technique, dynamic data layout (DDL). In DDL, data reorganization is performed between computations to effectivel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…Chatterjee et al discuss layout optimizations for a suite of dense matrix kernels in [7]. Park and Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [31]. One characteristic that all these problems share is a very regular memory accesses that are known at compile time.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Chatterjee et al discuss layout optimizations for a suite of dense matrix kernels in [7]. Park and Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [31]. One characteristic that all these problems share is a very regular memory accesses that are known at compile time.…”
Section: Related Workmentioning
confidence: 99%
“…While memory density has been growing rapidly, the speed of memory has been far outpaced by the speed of modern processors [32]. This phenomenon has resulted in severe application level performance degradation on high-end systems and has been well studied for many dense linear algebra problems like matrix multiplication and FFT [31][41] [45]. A number of groups are attempting to improve performance by performing computations in memory [6] [25].…”
Section: Introductionmentioning
confidence: 99%
“…The idea of platform adaptive loop body interleaving is introduced in [23] as an extension to FFTW and as an example of a general adaptation idea for divide and conquer algorithms [24]. Another variant of computing the DFT studies adaptation through runtime permutations versus re-addressing [25], [26]. Adaptive libraries for the related Walsh-Hadamard transform (WHT), based on similar ideas, have been developed in [27].…”
Section: Introductionmentioning
confidence: 99%
“…Frigo and others discuss the cache performance of cache oblivious algorithms for matrix transpose, FFT, and sorting in [9]. Park and Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [13]. One characteristic that these problems share is a very regular memory accesses that are known at compile time.…”
Section: Related Workmentioning
confidence: 99%