Scalable connectionless RDMA over unreliable datagrams

Grant, Ryan E.; Rashti, Mohammad Javad; Balaji, Pavan; Afsahi, Ahmad

doi:10.1016/j.parco.2015.03.009

Cited by 5 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We did not evaluate this solution as it relies on the TCP stack, which is incompatible with detector electronics. Interestingly, there are preliminary studies (Grant et al, 2015;Lenkiewicz et al, 2018) of a UDP/iWARP implementation which deserve further investigation.…”

Section: Overview Of Dma and Rdmamentioning

confidence: 99%

RDMA data transfer and GPU acceleration methods for high-throughput online processing of serial crystallography images

Ponsard

Janvier

Kieffer

et al. 2020

J Synchrotron Radiat

View full text Add to dashboard Cite

The continual evolution of photon sources and high-performance detectors drives cutting-edge experiments that can produce very high throughput data streams and generate large data volumes that are challenging to manage and store. In these cases, efficient data transfer and processing architectures that allow online image correction, data reduction or compression become fundamental. This work investigates different technical options and methods for data placement from the detector head to the processing computing infrastructure, taking into account the particularities of modern modular high-performance detectors. In order to compare realistic figures, the future ESRF beamline dedicated to macromolecular X-ray crystallography, EBSL8, is taken as an example, which will use a PSI JUNGFRAU 4M detector generating up to 16 GB of data per second, operating continuously during several minutes. Although such an experiment seems possible at the target speed with the 100 Gb s−1 network cards that are currently available, the simulations generated highlight some potential bottlenecks when using a traditional software stack. An evaluation of solutions is presented that implements remote direct memory access (RDMA) over converged ethernet techniques. A synchronization mechanism is proposed between a RDMA network interface card (RNIC) and a graphics processing unit (GPU) accelerator in charge of the online data processing. The placement of the detector images onto the GPU is made to overlap with the computation carried out, potentially hiding the transfer latencies. As a proof of concept, a detector simulator and a backend GPU receiver with a rejection and compression algorithm suitable for a synchrotron serial crystallography (SSX) experiment are developed. It is concluded that the available transfer throughput from the RNIC to the GPU accelerator is at present the major bottleneck in online processing for SSX experiments.

show abstract

Section: Overview Of Dma and Rdmamentioning

confidence: 99%