Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis 2009
DOI: 10.1145/1654059.1654090
|View full text |Cite
|
Sign up to set email alerts
|

Auto-tuning 3-D FFT library for CUDA GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
61
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 106 publications
(62 citation statements)
references
References 9 publications
1
61
0
Order By: Relevance
“…Several CUDA implementations for linear algebra subroutines and FFTs with auto-tuning capability already exist [7,12,19].…”
Section: Related Workmentioning
confidence: 99%
“…Several CUDA implementations for linear algebra subroutines and FFTs with auto-tuning capability already exist [7,12,19].…”
Section: Related Workmentioning
confidence: 99%
“…Relative to this work, our contribution is to show how to do fast predictive auto-tuning that satisfies the requirements to: (a) handle the variety of recent multicore architectures like GPUs [Schaa and Kaeli, 2009], (b) provide high-performance domain-specific libraries [Nukada and Matsuoka, 2009, Li et al, 2009, Kamil et al, 2010, (c) that select good implementations at run-time [Klöckner et al, 2011, Pinto andCox, 2012], and (d) for the full input domain of a library routine [Liu et al, 2009, Grauer-Gray and.…”
Section: Auto-tuningmentioning
confidence: 99%
“…As shown in [16] for example, it can be far more beneficial to recompute large segments of constant values instead of fetching them from main memory. Others [8] show that, in some cases, the most direct algorithm can outperform the CPU optimized one. Another source of performance loss is thread divergence due to asymmetrical branching in control flow.…”
Section: Mpi Parallelismmentioning
confidence: 99%