Proceedings of the 19th International Workshop on Data Management on New Hardware 2023
DOI: 10.1145/3592980.3595314
|View full text |Cite
|
Sign up to set email alerts
|

The Difficult Balance Between Modern Hardware and Conventional CPUs

Abstract: Research has demonstrated the potential of accelerators in a wide range of use cases. However, there is a growing imbalance between modern hardware and the CPUs that submit the workload. Recent studies of GPUs on real systems have shown that many servers are often needed per accelerator to generate a high enough load so the computing power is leveraged. This fact is often ignored in research, although it often determines the actual feasibility and overall efficiency of a deployment. In this paper, we conduct a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…The PCIe bus, as evident from both Figure 8 and Figure 11, does not perform optimally when the ratio between computation and data transfer is high, meaning that there is insufficient data to maximise its throughput. To overcome this limitation, the conventional approach has been to batch process multiple individual invocations into a single operation, which trades higher throughput for significantly increased individual latency [39]. The introduction of Strega, which provides an even higher level of abstraction for FPGA-based kernels compared to OpenCL without compromising performance, marks a significant milestone in the integration of heterogeneous hardware into distributed systems.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The PCIe bus, as evident from both Figure 8 and Figure 11, does not perform optimally when the ratio between computation and data transfer is high, meaning that there is insufficient data to maximise its throughput. To overcome this limitation, the conventional approach has been to batch process multiple individual invocations into a single operation, which trades higher throughput for significantly increased individual latency [39]. The introduction of Strega, which provides an even higher level of abstraction for FPGA-based kernels compared to OpenCL without compromising performance, marks a significant milestone in the integration of heterogeneous hardware into distributed systems.…”
Section: Discussionmentioning
confidence: 99%
“…Both communicate trough the PCIe bus, much like GPU kernels, as illustrated in Figure 1: (i) memory is first allocated on the device; (ii) data is transferred from the host to the device; (iii) the kernel is executed; and (iv) the CPU fetches the result data from the device memory. In the context of distributed systems deployed in the cloud, the consequence of this flow is that client requests must navigate through the CPU in order to be accelerated by the FPGA, imposing not only considerable communication overhead [41], but more importantly a very tight coupling between both the host and the accelerator [39].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation