Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities

Ammendola, Roberto; Biagioni, A.; Frezza, Ottorino; Cicero, Francesca Lo; Lonardo, A.; Paolucci, P.; Rossetti, Davide; Simula, Francesco; Tosoratto, Laura; Vicini, P.

doi:10.1109/fpt.2013.6718331

Cited by 16 publications

(13 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [21] a number of CAM designs suitable for FPGA implementation are discussed. Our approach is a RAM-based design (in terms of resources used) but with a significant difference in the interpretation of the RAM ports semantics, as explained in [6].…”

Section: Hardware Tlb Implementationmentioning

confidence: 99%

“…Just like on x86_64 an associative cache called Translation Lookaside Buffer (TLB) assists the MMU in its duties, we followed suit and implemented a TLB for APEnet+ by means of a Content Addressable Memory (CAM) [6]. The CAM implementation allows the lowest possible latency for the TLB, which achieves actual saturation over the links.…”

Section: Introductionmentioning

confidence: 99%

“…In Section 4 the RDMA semantics for a network transfer is described. Section 5 contains specifics of the different page walking implementations; the Nios II firmware is in 5.1, the TLB is sketched in 5.2 -its in-depth examination can be found in [6] -and the ASIP is in 5.3; quantitative comparison between firmware and ASIP is in Section 6. Plans for complete integration of the page walking ASIP solution into APEnet+ and roadmap for its future development are presented in Section 7; conclusions are in Section 8.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

Ammendola

Biagioni

Frezza

et al. 2015

Future Generation Computer Systems

Self Cite

View full text Add to dashboard Cite

Section: Hardware Tlb Implementationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

Ammendola

Biagioni

Frezza

et al. 2015

Future Generation Computer Systems

Self Cite

View full text Add to dashboard Cite

“…In the HPC domain, SVM for letting FPGA accelerators themselves orchestrate data transfers from and to main memory has proven beneficial both for performance and programmability [3] [23]. While academic works focus on SVM solutions consisting of TLBs managed in software either running on the host [23] or on a dedicated soft processor core [24], the industry's approach is that of fullblown hardware for maximum performance [3] [4] [5]. To optimize off-chip bandwidth utilization between host and FPGA, such systems employ FPGA-side data caches, as well as transaction coalescing and reordering [25], which further increases design complexity and leads to considerable resource utilization.…”

Section: Related Workmentioning

confidence: 99%

Exploring Shared Virtual Memory for FPGA Accelerators with a Configurable IOMMU

Vogel

Marongiu

Benini

2019

IEEE Trans. Comput.

View full text Add to dashboard Cite

A key enabler for the ever-increasing adoption of FPGA accelerators is the availability of frameworks allowing for the seamless coupling to general-purpose host processors. Embedded FPGA+CPU systems still heavily rely on copy-based host-to-accelerator communication, which complicates application development.In this paper, we present a hardware/software framework for enabling transparent, shared virtual memory for FPGA accelerators in embedded SoCs. It can use a hard-macro IOMMU if available, or a configurable soft-core IOMMU that we provide. We explore different TLB configurations and provide a comparison with other designs for shared virtual memory to gain insight on performance-critical IOMMU components. Experimental results using pointer-rich benchmarks show that our framework not only simplifies FPGA-accelerated application development, it also achieves up to 13x speedup compared to traditional copy-based offloading.

show abstract

“…The hardware follows the RDMA paradigm to manage the I/O towards both CPU and GPU -for this latter the protocol is more precisely the GPUDirect RDMA [5], -in this way avoiding the use of bounce buffers and minimizing the latency jitter of data transfers to and from application memory. The virtual-to-physical address translation is entrusted to a proprietary Translation Look-aside Buffer based on Content Addressable Memory [6]. In order to sustain the ∼ 320 MB/s aggregate bandwidth of the multi-channel system, incoming data are sent to the computing node by PCIe DMA write processes.…”

Section: Nanetmentioning

confidence: 99%

NaNet³: The on-shore readout and slow-control board for the KM3NeT-Italia underwater neutrino telescope

Ammendola

Biagioni

Frezza

et al. 2016

EPJ Web of Conferences

View full text Add to dashboard Cite

Abstract. The KM3NeT-Italia underwater neutrino detection unit, the tower, consists of 14 floors. Each floor supports 6 Optical Modules containing front-end electronics needed to digitize the PMT signal, format and transmit the data and 2 hydrophones that reconstruct in real-time the position of Optical Modules, for a maximum tower throughput of more than 600 MB/s. All floor data are collected by the Floor Control Module (FCM) board and transmitted by optical bidirectional virtual point-to-point connections to the on-shore laboratory, each FCM needing an on-shore counterpart as communication endpoint. In this contribution we present NaNet 3 , an on-shore readout board based on Altera Stratix V GX FPGA able to manage multiple FCM data channels with a capability of 800 Mbps each.

show abstract

Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities

Cited by 16 publications

References 8 publications

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

Exploring Shared Virtual Memory for FPGA Accelerators with a Configurable IOMMU

NaNet³: The on-shore readout and slow-control board for the KM3NeT-Italia underwater neutrino telescope

Contact Info

Product

Resources

About

Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities

Cited by 16 publications

References 8 publications

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

ASIP acceleration for virtual-to-physical address translation on RDMA-enabled FPGA-based network interfaces

Exploring Shared Virtual Memory for FPGA Accelerators with a Configurable IOMMU

NaNet3: The on-shore readout and slow-control board for the KM3NeT-Italia underwater neutrino telescope

Contact Info

Product

Resources

About

NaNet³: The on-shore readout and slow-control board for the KM3NeT-Italia underwater neutrino telescope