cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Shih, Yu-hsuan; Wright, Garrett; Andén, Joakim; Blaschke, Johannes; Barnett, Alex H.

doi:10.1109/ipdpsw52791.2021.00105

Cited by 14 publications

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is worth noting that the authors of FINUFFT [27] are working on a GPU implementation of the API which, at the time of writing, is still incomplete. According to their benchmarks [50], their implementation is much faster than gpuNUFFT [51] that we used in this paper. When cuFINUFFT is completed, we will integrate it in our implementation in order to produce the same output on both CPU and GPU, thus making the evaluation even fairer and, additionally, improving the performance.…”

Section: Discussionmentioning

confidence: 99%

Efficient Online 4D Magnetic Resonance Imaging

Barbone

Wetscherek

Yung

et al. 2021

2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

View full text Add to dashboard Cite

Magnetic Resonance (MR)-guided online Adaptive RadioTherapy (MR-goART) utilises the excellent soft-tissue contrast of MR images taken just before the patient's treatment to quickly update and personalise radiotherapy treatment plans. Four-dimensional (4D) MR Imaging (MRI) can resolve variations in respiratory motion patterns. 4D MRI data can be used to adapt the radiation beams to maximally target the tumour while sparing as much healthy tissue as possible. 4D MRI reconstruction, however, is computationally challenging and current state-of-the-art implementations are unable to meet MRgoART time requirements. This study bridges the gap between high-performance computing and medical applications by developing and implementing a parallel, heterogeneous architecture for the XD-GRASP algorithm capable of meeting the MR-goART time requirements. Our architecture exploits long-vector instructions and utilises all available resources, while minimising and hiding the communication overhead when external GPUs are used. As a result, the reconstruction time was reduced from 994 seconds to just 90 seconds with a speed-up of more than 11x. In addition, we evaluated the impact of the emerging Processing-in-Memory (PIM) technology. Our simulation results show that 16 low power, in-order PIM cores with no SIMD unit are 2.7x faster than an Intel Core™ i7-9700 8-core CPU equipped with AVX512 SIMD units. Additionally, 40 PIM cores match the performance of two AMD EPYC 7551 CPUs, with 32 cores each and just 87 PIM cores

show abstract

Section: Discussionmentioning

confidence: 99%

Efficient Online 4D Magnetic Resonance Imaging

Barbone

Wetscherek

Yung

et al. 2021

2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

View full text Add to dashboard Cite

show abstract

“…This can be done by computing D non-uniform Fourier transforms (see Wang et al (2022); Gossard et al (2022); Wang and Fessler (2021)). Different packages were tested and we finally opted for the cuFINUFFT implementation Shih et al (2021). The bindings for different kind of NUFT are available at https://github.com/albangossard/Bindings-NUFFT-pytorch/.…”

Section: Ethical Standardsmentioning

confidence: 99%

Bayesian Optimization of Sampling Densities in MRI

Gossard

Gournay

Weiss

2023

Melba

View full text Add to dashboard Cite

Data-driven optimization of sampling patterns in MRI has recently received a significant attention. Following recent observations on the combinatorial number of minimizers in off-the-grid optimization, we propose a framework to globally optimize the sampling densities using Bayesian optimization. Using a dimension reduction technique, we optimize the sampling trajectories more than 20 times faster than conventional off-the-grid methods, with a restricted number of training samples. This method – among other benefits – discards the need of automatic differentiation. Its performance is slightly worse than state-of-the-art learned trajectories since it reduces the space of admissible trajectories, but comes with significant computational advantages. Other contributions include: i) a careful evaluation of the distance in probability space to generate trajectories ii) a specific training procedure on families of operators for unrolled reconstruction networks and iii) a gradient projection based scheme for trajectory optimization.

show abstract

“…Therefore, the total latency can be limited below 500 ms (66 ms+ 250 ms+150 ms=466 ms). Moreover, algorithm optimization and dedicated hardware to improve the efficiency of NUFFT and DVF computations can further reduce the latency (Knoll et al 2014, Shih et al 2021.…”

Section: Prospect Of Real-time Mr Imagingmentioning

confidence: 99%

Real-time MRI motion estimation through an unsupervised k-space-driven deformable registration network (KS-RegNet)

Shao¹,

Li²,

Dohopolski³

et al. 2022

Phys. Med. Biol.

View full text Add to dashboard Cite

Purpose: Real-time three-dimensional(3D) magnetic resonance(MR) imaging is challenging because of slow MR signal acquisition, leading to highly under-sampled k-space data. Here, we proposed a deep learning-based, k-space-driven deformable registration network(KS-RegNet) for real-time 3D MR imaging. By incorporating prior information, KS-RegNet performs a deformable image registration between a fully-sampled prior image and on-board images acquired from highly-under-sampled k-space data, to generate high-quality on-board images for real-time motion tracking. Methods: KS-RegNet is an end-to-end, unsupervised network consisting of an input data generation block, a subsequent U-Net core block, and following operations to compute data fidelity and regularization losses. The input data involved a fully-sampled, complex-valued prior image, and the k-space data of an on-board, real-time MR image(MRI). From the k-space data, under-sampled real-time MRI was reconstructed by the data generation block to input into the U-Net core. In addition, to train the U-Net core to learn the under-sampling artifacts, the k-space data of the prior image was intentionally under-sampled using the same readout trajectory as the real-time MRI, and reconstructed to serve an additional input. The U-Net core predicted a deformation vector field that deforms the prior MRI to on-board real-time MRI. To avoid adverse effects of quantifying image similarity on the artifacts-ridden images, the data fidelity loss of deformation was evaluated directly in k-space. Results: Compared with Elastix and other deep learning network architectures, KS-RegNet demonstrated better and more stable performance. The average(±s.d.) DICE coefficients of KS-RegNet on a cardiac dataset for the 5-, 9-, and 13-spoke k-space acquisitions were 0.884±0.025, 0.889±0.024, and 0.894±0.022, respectively; and the corresponding average(±s.d.) center-of-mass errors(COMEs) were 1.21±1.09, 1.29±1.22, and 1.01±0.86 mm, respectively. KS-RegNet also provided the best performance on an abdominal dataset. Conclusion: KS-RegNet allows real-time MRI generation with sub-second latency. It enables potential real-time MR-guided soft tissue tracking, tumor localization, and radiotherapy plan adaptation.

show abstract

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Cited by 14 publications

References 24 publications

Efficient Online 4D Magnetic Resonance Imaging

Efficient Online 4D Magnetic Resonance Imaging

Bayesian Optimization of Sampling Densities in MRI

Real-time MRI motion estimation through an unsupervised k-space-driven deformable registration network (KS-RegNet)

Contact Info

Product

Resources

About