2022
DOI: 10.1007/s11227-022-04570-9
|View full text |Cite
|
Sign up to set email alerts
|

Memory-accelerated parallel method for multidimensional fast fourier implementation on GPU

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…Parallelization strategies FHT have also evolved, with recent works exploring advanced techniques such as dynamic parallelism [39], shared memory utilization [25], and warplevel programming [40]. These approaches aim to reduce the computational overhead and improve the scalability of the FHT on GPUs, enabling its application in more complex and data-intensive scenarios.…”
Section: Related Workmentioning
confidence: 99%
“…Parallelization strategies FHT have also evolved, with recent works exploring advanced techniques such as dynamic parallelism [39], shared memory utilization [25], and warplevel programming [40]. These approaches aim to reduce the computational overhead and improve the scalability of the FHT on GPUs, enabling its application in more complex and data-intensive scenarios.…”
Section: Related Workmentioning
confidence: 99%
“…AMD also released the rocFFT library that runs on the Radeon Open Computing Platform (ROCm) [18]. Hu and others [19] optimized the memory-access pattern for FFT computations on ROCm. The proposed implementation achieved speed enhancements ranging from 25% to 250% compared with the speed of rocFFT.…”
Section: Related Workmentioning
confidence: 99%
“…Migrating existing FT algorithms directly to NPUs is infeasible due to the distinctive hardware architecture of NPUs. This is because the matrix and vector units of NPUs process data blocks in a serial manner, while GPUs and FPGAs process bytes with multiple threads in parallel [15]. Consequently, the conventional fast FT algorithm may not be suitable for NPUs.…”
Section: Introductionmentioning
confidence: 99%