JetStream: An open-source high-performance PCI Express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication

Vesper, Malte; Koch, Dirk; Vipin, Kizheppatt; Fahmy, Suhaib A.

doi:10.1109/fpl.2016.7577334

Cited by 30 publications

(13 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Custom clusters are based on the concept of systolic array model in parallel computing architecture, where every node acts as a data processing unit and processed data move from one node to another through first-in first-out (FIFO) buffer or network semantics. Some of these architectures [115][116][117][118] use Peer to Peer (P2P) connection MaxRing, fast series transceivers with FIFO buffers, and Peripheral Component Interconnect Express (PCIe) links, for transmitting data across multiple nodes. Tailored designs allow the direct communication among the nodes through explicit network connections.…”

Section: Custom Clustersmentioning

confidence: 99%

See 1 more Smart Citation

Revisiting the High-Performance Reconfigurable Computing for Future Datacenters

2020

View full text Add to dashboard Cite

Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects—discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well.

show abstract

Section: Custom Clustersmentioning

confidence: 99%

“…Single FPGA [4,76,77,86,87,106,107,109,115,117,118,125] [ 23,25,75,78,79,[84][85][86]89,107] Multiple FPGAs [4,77,87,106,115,117,125] [41,75,78,79]…”

Section: Single Application Multiple Applicationsmentioning

confidence: 99%

Revisiting the High-Performance Reconfigurable Computing for Future Datacenters

2020

View full text Add to dashboard Cite

show abstract

“…e accelerators must be accessed through an underlying infrastructure that interfacing host and FPGA accelerators. ere are currently a few numbers of academic [5,[8][9][10][11] and industrial [6, 7] frameworks that are designed to connect a host to the FPGA accelerators. However, they either do not support or fail to provide a seamless interface to multiple accelerators accessed simultaneously by various applications.…”

Section: Background and Motivationmentioning

confidence: 99%

“…rough a static accelerator allocation [5][6][7][8][9][10] exploiting parallelism is not practically possible, since applications are not aware of the status of all the accelerators on FPGAs.…”

Section: Background and Motivationmentioning

confidence: 99%

“…For every single request from the host, an accelerator on the FPGA is allocated to the request until the result is sent back to the host. To the best of our knowledge, all the current FPGA accelerator frameworks [5][6][7][8][9][10] follow a static accelerator allocation scheme; it means that so ware developers have to exactly specify the target accelerators for any access request in the so ware code. is can lead to a poor utilization of accelerators when being shared among various applications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

UltraShare: FPGA-based Dynamic Accelerator Sharing and Allocation

Rezaei

Bozorgzadeh

Kim

2019

2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

View full text Add to dashboard Cite

Despite all the available commercial and open-source frameworks to ease deploying FPGAs in accelerating applications, the current schemes fail to support sharing multiple accelerators among various applications. ere are three main features that an accelerator sharing scheme requires to support: exploiting dynamic parallelism of multiple accelerators for a single application, sharing accelerators among multiple applications, and providing a non-blocking congestion-free environment for applications to invoke the accelerators. In this paper, we developed a scalable fully functional hardware controller, called UltraShare, with a supporting so ware stack that provides a dynamic accelerator sharing scheme through an accelerators grouping mechanism. UltraShare allows so ware applications to fully utilize FPGA accelerators in a non-blocking congestion-free environment. Our experimental results for a simple scenario of a combination of three streaming accelerators invocation show an improvement of up to 8x in throughput of the accelerators by removing accelerators idle times.

show abstract