Faster GPU-based convolutional gridding via thread coarsening

Merry, Bruce

doi:10.1016/j.ascom.2016.05.004

Cited by 17 publications

(19 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, we regard pre-processing as part of an initialisation step that will not run for every major cycle and therefore not subject to a study of its Performance in this thesis. Similar reasoning was made by Merry [85] in analysing an enhanced Convolutional Gridder for w-projection.…”

Section: Pre-processingsupporting

confidence: 54%

“…As previously noted, Greisen [75] states that C co (ξ) should be stored in a lookup table, as to reduce computation. This practice, which we refer to as GCF oversampling is until today a standard rule in the implementation of Convolutional Gridding (for example CASA), since the convolution function is usually computationally intensive to calculate for each record during the gridding process (Merry [85]).…”

Section: Gcf Oversamplingmentioning

confidence: 99%

“…Enhanced by Muscat [21] and Merry [85], Romein's strategy is the fastest algorithm which we are aware of, able to grid on GPUs. It acquires high-performance by taking advantage of the trajectory of records for a given baseline.…”

Section: Romein's Moving Window Strategymentioning

confidence: 99%

See 2 more Smart Citations

High-Performance Gridding For Radio Interferometric Image Synthesis

Muscat¹

2021

Preprint

View full text Add to dashboard Cite

List of Figures 7.1 Optimal Solutions Performance results of the Column Pruner in the Interleaved and Non-Interleaved Square Grid Experiments . . . . . . 7.2 Optimal Solutions Performance results of the Column Pruner in the Non-Interleaved Rectangular Grid Experiments . . . . . . . . . . . . 7.3 Utilisation results of the Column Pruner in the Interleaved and Non-Interleaved Square Grid Experiments .

show abstract

Section: Pre-processingsupporting

confidence: 54%

Section: Gcf Oversamplingmentioning

confidence: 99%

See 1 more Smart Citation

High-Performance Gridding For Radio Interferometric Image Synthesis

Muscat¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Romein [9] proposed a GPUbased work-distribution strategy for W-projection gridding, which optimizes the data accumulation in on-chip registers rather than in off-chip memory and keeps the number of expensive off-chip memory accesses very low. This work was further improved in the work of Merry [24], where the author applied thread coarsening to improve the efficiency of grid computing and observed performance gains for singlepolarization gridding and quad-polarization gridding on the target GPU. For the work of Muscat [25] observed that in some situations, especially for short baselines, the positions of two neighboring visibilities are the same on the higher resolution grid, and we do not need to grid each visibility independently.…”

Section: B Related Workmentioning

confidence: 99%

FPGA-Based Scale-Out Prototyping of Degridding Algorithm for Accelerating Square Kilometre Array Telescope Data Processing

Hou

Zhu

et al. 2020

IEEE Access

View full text Add to dashboard Cite

The SKA (Square Kilometre Array) radio telescope will become the most sensitive telescope by correlating a large number of antenna nodes to form a giant antenna array. The data generated from such a large number of antenna nodes will pose a huge storage problem and require real-time data processing to make the best use of data, and the SKA Scientific Data Processing becomes the bottleneck of the whole processing flow. However, the existing high-performance CPU-and GPU (Graphics Processing Unit)-based solutions cannot satisfy the performance requirements and power budget requirements well [1]. Due to the consideration of the high energy efficiency of hardware accelerators and the flexibility and cost of prototype design, in this paper, we explore the FPGA(Field Programmable Gate Array)-based prototype of one of the most computationally demanding procedures in SKA scientific data processing: degridding. Through the analysis of algorithm behavior and bottlenecks, we design and optimize the memory architecture and computing logic of an FPGA-based prototype. Besides, with the consideration of the relations between the required data of processing multiple spectral channels, we reuse the shared data in processing neighboring spectral channels, and the performance further improves. The functionality and performance of our design have been verified on the target FPGA board, and the software-based benchmarks were also measured on comparable CPU and GPU platforms, indicating that the FPGA-based prototype achieves 2.74 times and 2.03 times speedup, 7.64 times and 7.42 times energy efficiency than the MPI(Message Passing Interface)-based CPU benchmark and the CUDA (Compute Unified Device Architecture)-based GPU benchmark, respectively. INDEX TERMS FPGA, gridding/degridding, scientific data processing, square kilometre array.

show abstract

“…They also compare the performance on different high-end platforms in CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) to further measure the performance. Based on Romein's algorithm, Merry [26] presents a thread coarsening method where the multiple work items are merged into one across parallel work items to improve instruction-level parallelism and the efficiency of gridding computing for single-polarization and quad-polarization on the target GPUs. Veenboer et al [10] initiated the CPUs-based and GPUs-based image-domain gridding (IDG) algorithm and first presented the efficient degridding implementation on GPUs.…”

Section: Related Workmentioning

confidence: 99%

Accelerating Faceting Wide-Field Imaging Algorithm with FPGA for SKA Radio Telescope as a Vast Sensor Array

Song

Zhu

Nan

et al. 2020

Sensors

View full text Add to dashboard Cite

The SKA (Square Kilometer Array) radio telescope will become the most sensitive telescope by correlating a huge number of antenna nodes to form a vast array of sensors in a region over one hundred kilometers. Faceting, the wide-field imaging algorithm, is a novel approach towards solving image construction from sensing data where earth surface curves cannot be ignored. However, the traditional processor of cloud computing, even if the most sophisticated supercomputer is used, cannot meet the extremely high computation performance requirement. In this paper, we propose the design and implementation of high-efficiency FPGA (Field Programmable Gate Array) -based hardware acceleration of the key algorithm, faceting in SKA by focusing on phase rotation and gridding, which are the most time-consuming phases in the faceting algorithm. Through the analysis of algorithm behavior and bottleneck, we design and optimize the memory architecture and computing logic of the FPGA-based accelerator. The simulation and tests on FPGA are done to confirm the acceleration result of our design and it is shown that the acceleration performance we achieved on phase rotation is 20× the result of the previous work. We then further designed and optimized an efficient microstructure of loop unrolling and pipeline for the gridding accelerator, and the designed system simulation was done to confirm the performance of our structure. The result shows that the acceleration ratio is 5.48 compared to the result tested on software in gridding parts. Hence, our approach enables efficient acceleration of the faceting algorithm on FPGAs with high performance to meet the computational constraints of SKA as a representative vast sensor array.

show abstract

Faster GPU-based convolutional gridding via thread coarsening

Cited by 17 publications

References 11 publications

High-Performance Gridding For Radio Interferometric Image Synthesis

High-Performance Gridding For Radio Interferometric Image Synthesis

FPGA-Based Scale-Out Prototyping of Degridding Algorithm for Accelerating Square Kilometre Array Telescope Data Processing

Accelerating Faceting Wide-Field Imaging Algorithm with FPGA for SKA Radio Telescope as a Vast Sensor Array

Contact Info

Product

Resources

About