The Difficult Balance Between Modern Hardware and Conventional CPUs

Maschi, Fabio; Alonso, Gustavo

doi:10.1145/3592980.3595314

Cited by 3 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The PCIe bus, as evident from both Figure 8 and Figure 11, does not perform optimally when the ratio between computation and data transfer is high, meaning that there is insufficient data to maximise its throughput. To overcome this limitation, the conventional approach has been to batch process multiple individual invocations into a single operation, which trades higher throughput for significantly increased individual latency [39]. The introduction of Strega, which provides an even higher level of abstraction for FPGA-based kernels compared to OpenCL without compromising performance, marks a significant milestone in the integration of heterogeneous hardware into distributed systems.…”

Section: Discussionmentioning

confidence: 99%

“…Both communicate trough the PCIe bus, much like GPU kernels, as illustrated in Figure 1: (i) memory is first allocated on the device; (ii) data is transferred from the host to the device; (iii) the kernel is executed; and (iv) the CPU fetches the result data from the device memory. In the context of distributed systems deployed in the cloud, the consequence of this flow is that client requests must navigate through the CPU in order to be accelerated by the FPGA, imposing not only considerable communication overhead [41], but more importantly a very tight coupling between both the host and the accelerator [39].…”

Section: Introductionmentioning

confidence: 99%

“…Amazon Web Services offers FPGA boards connected to servers with specific numbers of vCPUs [57], but there is no option to connect a powerful CPU to an FPGA device. This limitation makes the use of FPGAs too expensive when a single FPGA can consume the workload produced by several CPUs [39].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Strega : An HTTP Server for FPGAs

Maschi,

Alonso

2024

ACM Trans. Reconfigurable Technol. Syst.

Self Cite

View full text Add to dashboard Cite

The computer architecture landscape is being reshaped by the new opportunities, challenges and constraints brought by the cloud. On the one hand, high-level applications profit from specialised hardware to boost their performance and reduce deployment costs. On the other hand, cloud providers maximise the CPU time allocated to client applications by offloading infrastructure tasks to hardware accelerators. While it is well understood how to do this for, e.g., network function virtualisation and protocols such as TCP/IP, support for higher networking layers is still largely missing, limiting the potential of accelerators. In this paper, we present S trega , an open-source 1 light-weight HTTP server that enables crucial functionality such as FPGA-accelerated functions being called through a RESTful protocol (FPGA-as-a-Function). Our experimental analysis shows that a single S trega node sustains a throughput of 1.7 M HTTP requests per second with an end-to-end latency as low as 16 μ s, outperforming nginx running on 32 vCPUs in both metrics, and can even be an alternative to the traditional OpenCL flow over the PCIe bus. Through this work, we pave the way for running microservices directly on FPGAs, bypassing CPU overhead and realising the full potential of FPGA acceleration in distributed cloud applications.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Strega : An HTTP Server for FPGAs

Maschi,

Alonso

2024

ACM Trans. Reconfigurable Technol. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

CXL and the Return of Scale-Up Database Engines

Lerner,

Alonso

2024

Proc. VLDB Endow.

View full text Add to dashboard Cite

The trend toward specialized processing devices such as TPUs, DPUs, GPUs, and FPGAs has exposed the weaknesses of PCIe in interconnecting these devices and their hosts. Several attempts have been proposed to improve, augment, or downright replace PCIe, and more recently, these efforts have converged into a standard called Compute Express Link (CXL). CXL is already on version 2.0 in terms of commercial availability, but its potential to radically change the conventional server architecture has only just started to surface. For example, CXL can increase the bandwidth and quantity of memory available to any single machine beyond what that machine can originally provide, most importantly, in a manner that is fully transparent to software applications. We argue, however, that CXL can have a broader impact beyond memory expansion and deeply affect the architecture of data-intensive systems. In a nutshell, while the cloud favored scale-out approaches that grew in capacity by adding full servers to a rack, CXL brings back scale-up architectures that can grow by fine-tuning individual resources, all while transforming the rack into a large shared-memory machine. In this paper, we describe why such architectural transformations are now possible, how they benefit emerging heterogeneous hardware platforms for data-intensive systems, and the associated research challenges.

show abstract

An Examination of CXL Memory Use Cases for In-Memory Database Management Systems Using SAP HANA

Ahn,

Willhalm,

May

et al. 2024

Proc. VLDB Endow.

View full text Add to dashboard Cite

CXL-based disaggregated memory systems offer options to expand the memory beyond the limits of a single server via cache-coherent memory expansion cards or memory pools. Especially, In-Memory Database Management Systems (IMDBMSs) can benefit from alleviating two critical constraints: (1) limited memory capacity in a server and (2) long restart time during failover to reload data to memory. However, the usage and effectiveness of CXL memory in enterprise-scale IMDBMSs has yet to be validated. In this work---for the first time---we investigate dynamic memory expansion employing commercial CXL memory devices for IMDBMSs. Our detailed performance analysis reveals that the performance impact of higher latency and lower memory bandwidth impact depends on the memory access patterns of data structures (cf. (1)). Additionally, we present the feasibility of CXL shared memory between servers to improve restart times during failover (cf. (2)). Our evaluation shows the effectiveness of CXL memory integrated into the SAP HANA Cloud IMDBMS. OLTP workloads have a negligible performance degradation while OLAP workloads have a wide range of performance degradation. CXL shared memory shows a 40% reduction of the restart time for TPC-H SF10 and 84% potential reduction for TPC-H SF100.

show abstract

The Difficult Balance Between Modern Hardware and Conventional CPUs

Cited by 3 publications

References 24 publications

Strega : An HTTP Server for FPGAs

Strega : An HTTP Server for FPGAs

CXL and the Return of Scale-Up Database Engines

An Examination of CXL Memory Use Cases for In-Memory Database Management Systems Using SAP HANA

Contact Info

Product

Resources

About