2021
DOI: 10.48550/arxiv.2102.08463
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Abstract: Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a generalpurpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…The Radon transform can similarly be computed by performing the above steps in reverse order. In this work we have used the cuFINUFFT (Shih et al, 2021) library to compute NUFFTs. For a detailed discussion on the topic, we refer the reader to Dutt & Rokhlin (1993), Fessler & Sutton (2003), Greengard & Lee (2004), Barnett et al (2019) and Barnett (2021).…”
Section: Figurementioning
confidence: 99%
“…The Radon transform can similarly be computed by performing the above steps in reverse order. In this work we have used the cuFINUFFT (Shih et al, 2021) library to compute NUFFTs. For a detailed discussion on the topic, we refer the reader to Dutt & Rokhlin (1993), Fessler & Sutton (2003), Greengard & Lee (2004), Barnett et al (2019) and Barnett (2021).…”
Section: Figurementioning
confidence: 99%
“…Most of the memory in 3D was occupied by the activations from the 3D convolutional neural networks used in the image denoising step in NC-PDNet. Memory efficient implementations of NUFFT was carried out by using tensorflow-nufft [34], which is based on tensorflow implementations of cuFINUFFT [35].…”
Section: Practical Implementationsmentioning
confidence: 99%
“…Merging and slicing operations are offloaded to GPUs using the cuFINUFFT [15] library, and data movement is handled using pyCUDA [16]. We compare performance of a multi-threaded instance using the FINUFFT [8] library with OpenMP to an equivalent CUDA implementation on a NVIDIA V100 and find that the forward function call runs approximately 1.5× faster, and the adjoint function call runs approximately 8× faster for our dataset.…”
Section: Acceleration -Gpu Offloadingmentioning
confidence: 99%