2016
DOI: 10.1016/j.ascom.2016.05.004
|View full text |Cite
|
Sign up to set email alerts
|

Faster GPU-based convolutional gridding via thread coarsening

Abstract: Convolutional gridding is a processor-intensive step in interferometric imaging. While it is possible to use graphics processing units (GPUs) to accelerate this operation, existing methods use only a fraction of the available flops. We apply thread coarsening to improve the efficiency of an existing algorithm, and observe performance gains of up to 3.2× for single-polarization gridding and 1.9× for quad-polarization gridding on a GeForce GTX 980, and smaller but still significant gains on a Radeon R9 290X.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
16
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(19 citation statements)
references
References 11 publications
3
16
0
Order By: Relevance
“…Therefore, we regard pre-processing as part of an initialisation step that will not run for every major cycle and therefore not subject to a study of its Performance in this thesis. Similar reasoning was made by Merry [85] in analysing an enhanced Convolutional Gridder for w-projection.…”
Section: Pre-processingsupporting
confidence: 54%
See 2 more Smart Citations
“…Therefore, we regard pre-processing as part of an initialisation step that will not run for every major cycle and therefore not subject to a study of its Performance in this thesis. Similar reasoning was made by Merry [85] in analysing an enhanced Convolutional Gridder for w-projection.…”
Section: Pre-processingsupporting
confidence: 54%
“…As previously noted, Greisen [75] states that C co (ξ) should be stored in a lookup table, as to reduce computation. This practice, which we refer to as GCF oversampling is until today a standard rule in the implementation of Convolutional Gridding (for example CASA), since the convolution function is usually computationally intensive to calculate for each record during the gridding process (Merry [85]).…”
Section: Gcf Oversamplingmentioning
confidence: 99%
See 1 more Smart Citation
“…Romein [9] proposed a GPUbased work-distribution strategy for W-projection gridding, which optimizes the data accumulation in on-chip registers rather than in off-chip memory and keeps the number of expensive off-chip memory accesses very low. This work was further improved in the work of Merry [24], where the author applied thread coarsening to improve the efficiency of grid computing and observed performance gains for singlepolarization gridding and quad-polarization gridding on the target GPU. For the work of Muscat [25] observed that in some situations, especially for short baselines, the positions of two neighboring visibilities are the same on the higher resolution grid, and we do not need to grid each visibility independently.…”
Section: B Related Workmentioning
confidence: 99%
“…They also compare the performance on different high-end platforms in CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) to further measure the performance. Based on Romein's algorithm, Merry [26] presents a thread coarsening method where the multiple work items are merged into one across parallel work items to improve instruction-level parallelism and the efficiency of gridding computing for single-polarization and quad-polarization on the target GPUs. Veenboer et al [10] initiated the CPUs-based and GPUs-based image-domain gridding (IDG) algorithm and first presented the efficient degridding implementation on GPUs.…”
Section: Related Workmentioning
confidence: 99%