Dynamic data layouts for cache-conscious factorization of DFT

Park, N.; Kang, Dong-In; Bondalapati, Kiran; Prasanna, Viktor K.

doi:10.1109/ipdps.2000.846054

Cited by 8 publications

(10 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chatterjee et al discuss layout optimizations for a suite of dense matrix kernels in [7]. Park and Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [31]. One characteristic that all these problems share is a very regular memory accesses that are known at compile time.…”

Section: Related Workmentioning

confidence: 99%

“…While memory density has been growing rapidly, the speed of memory has been far outpaced by the speed of modern processors [32]. This phenomenon has resulted in severe application level performance degradation on high-end systems and has been well studied for many dense linear algebra problems like matrix multiplication and FFT [31][41] [45]. A number of groups are attempting to improve performance by performing computations in memory [6] [25].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing graph algorithms for improved cache performance

Park

Penner

Prasanna

2004

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Optimizing graph algorithms for improved cache performance

Park

Penner

Prasanna

2004

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The idea of platform adaptive loop body interleaving is introduced in [23] as an extension to FFTW and as an example of a general adaptation idea for divide and conquer algorithms [24]. Another variant of computing the DFT studies adaptation through runtime permutations versus re-addressing [25], [26]. Adaptive libraries for the related Walsh-Hadamard transform (WHT), based on similar ideas, have been developed in [27].…”

Section: Introductionmentioning

confidence: 99%

SPIRAL: Code Generation for DSP Transforms

et al. 2005

View full text Add to dashboard Cite

Abstract-Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL that considers this problem for the performance-critical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem, and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, SPIRAL "intelligently" generates and explores algorithmic and implementation choices to find the best match to the computer's microarchitecture. The "intelligence" is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high performance code for a broad set of DSP transforms including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code.

show abstract

“…Frigo and others discuss the cache performance of cache oblivious algorithms for matrix transpose, FFT, and sorting in [9]. Park and Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [13]. One characteristic that these problems share is a very regular memory accesses that are known at compile time.…”

Section: Related Workmentioning

confidence: 99%