2021
DOI: 10.1007/s11227-020-03591-6
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the performance of FFT library implementations on modern hybrid computing systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…More recently, the research community is focusing on developing ecient FFT implementations targeting emerging architectures with dierent degrees of parallelism, e.g., high number of cores and long SIMD or vector units. Chow et al [3] report their eort in taking advantage of the IBM Cell BE for the computation of large FFTs; Anderson et al [1] make use of FPGAs for accelerating 3D FFTs; Wang et al [13] present an FFT optimization for Armv8 architectures; Malkovsky et al [9] evaluate FFTs on heterogeneous HPC compute nodes including GP-GPUs. Most of those studies are limited to up to 8-elements SIMD units in CPUs or high thread-level parallelism in GPUs while the implementations proposed in our paper are targeting wider vector units.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, the research community is focusing on developing ecient FFT implementations targeting emerging architectures with dierent degrees of parallelism, e.g., high number of cores and long SIMD or vector units. Chow et al [3] report their eort in taking advantage of the IBM Cell BE for the computation of large FFTs; Anderson et al [1] make use of FPGAs for accelerating 3D FFTs; Wang et al [13] present an FFT optimization for Armv8 architectures; Malkovsky et al [9] evaluate FFTs on heterogeneous HPC compute nodes including GP-GPUs. Most of those studies are limited to up to 8-elements SIMD units in CPUs or high thread-level parallelism in GPUs while the implementations proposed in our paper are targeting wider vector units.…”
Section: Introductionmentioning
confidence: 99%
“…4 More recently, the research community is focusing on developing efficient FFT implementations targeting emerging architectures with different degrees of parallelism, for example, high number of cores and long SIMD or vector units. Chow et al 5 report their effort in taking advantage of the IBM Cell BE for the computation of large FFTs; Anderson et al 6 make use of FPGAs for accelerating 3D FFTs; Wang et al 7 present an FFT optimization for Armv8 architectures; Malkovsky et al 8 evaluate FFTs on heterogeneous HPC compute nodes including GP-GPUs. Most of those studies are limited to up to 8-elements SIMD units in CPUs or high thread-level parallelism in GPUs while the implementations proposed in our article are targeting wider vector units.…”
Section: Related Workmentioning
confidence: 99%