An efficient work-distribution strategy for gridding radio-telescope data on GPUs

Romein, John W.

doi:10.1145/2304576.2304620

Cited by 28 publications

(25 citation statements)

References 5 publications

(5 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a workaround, we added the statement asm("") as a compiler-level memory barrier. Romein (2012) found that the majority of memory traffic was due to cache misses in reading the convolution GCF values. To avoid this problem, we have used a separable approximation to the GCF (Merry, 2016).…”

Section: Implementation Detailsmentioning

confidence: 99%

See 1 more Smart Citation

Faster GPU-based convolutional gridding via thread coarsening

Merry

2016

Astronomy and Computing

View full text Add to dashboard Cite

Convolutional gridding is a processor-intensive step in interferometric imaging. While it is possible to use graphics processing units (GPUs) to accelerate this operation, existing methods use only a fraction of the available flops. We apply thread coarsening to improve the efficiency of an existing algorithm, and observe performance gains of up to 3.2× for single-polarization gridding and 1.9× for quad-polarization gridding on a GeForce GTX 980, and smaller but still significant gains on a Radeon R9 290X.

show abstract

Section: Implementation Detailsmentioning

confidence: 99%

“…However, the irregular data access patterns make this a non-trivial task. One of the first really practical algorithms for GPU-accelerated gridding is due to Romein (2012). Despite being state of the art, it typically spends only about 25% of a GPU's compute power on the actual convolution operations.…”

Section: Introductionmentioning

confidence: 99%

Faster GPU-based convolutional gridding via thread coarsening

Merry

2016

Astronomy and Computing

View full text Add to dashboard Cite

show abstract

“…We can investigate computational efficiency of the most costly algorithmic components, an estimate based on our current undestanding of the required processing, on current day bestof-breed hardware. This shows very poor efficiency of at most 20% of R peak [1], [10].…”

Section: A Defining the Required Sdp Capacitymentioning

confidence: 99%

The Square Kilometre Array Science Data Processor. Preliminary compute platform design

2015

View full text Add to dashboard Cite

The Square Kilometre Array is a next-generation radio-telescope, to be built in South Africa and Western Australia. It is currently in its detailed design phase, with procurement and construction scheduled to start in 2017. The SKA Science Data Processor is the high-performance computing element of the instrument, responsible for producing science-ready data. This is a major IT project, with the Science Data Processor expected to challenge the computing state-of-the art even in 2020. In this paper we introduce the preliminary Science Data Processor design and the principles that guide the design process, as well as the constraints to the design. We introduce a highly scalable and flexible system architecture capable of handling the SDP workload.

show abstract

“…While this may be sufficient for some applications, in the general case other elements in the data path pose unsolved problems of scale owing to dependence on at least N 2 . The computation challenges associated with gridding irregularly spaced visibilities in preparation for FFT imaging Romein, (2012), and subtraction of sky models from correlator output in the visibility domain Mitchell, et al (2008), for example, will also need to be addressed.…”

Section: Scalabilitymentioning

confidence: 99%

A Scalable Hybrid Fpga/Gpu Fx Correlator

Kocz

Greenhill

Barsdell

et al. 2014

J. Astron. Instrum.

View full text Add to dashboard Cite

Radio astronomical imaging arrays comprising large numbers of antennas, O(102–103), have posed a signal processing challenge because of the required O (N2) cross correlation of signals from each antenna and requisite signal routing. This motivated the implementation of a Packetized Correlator architecture that applies Field Programmable Gate Arrays (FPGAs) to the O (N) "F-stage" transforming time domain to frequency domain data, and Graphics Processing Units (GPUs) to the O (N2) "X-stage" performing an outer product among spectra for each antenna. The design is readily scalable to at least O(103) antennas. Fringes, visibility amplitudes and sky image results obtained during field testing are presented.

show abstract

An efficient work-distribution strategy for gridding radio-telescope data on GPUs

Cited by 28 publications

References 5 publications

Faster GPU-based convolutional gridding via thread coarsening

Faster GPU-based convolutional gridding via thread coarsening

The Square Kilometre Array Science Data Processor. Preliminary compute platform design

A Scalable Hybrid Fpga/Gpu Fx Correlator

Contact Info

Product

Resources

About