User Space Network Drivers

Emmerich, Paul; Pudelko, Maximilian; Bauer, Simon; Huber, S.; Zwickl, Thomas; Carle, Georg

doi:10.1109/ancs.2019.8901894

Cited by 8 publications

(2 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2.1 depicts a schematic overview of the processes performed during both reception and transmission of packets. A more detailed, technical description of the process is presented in [42] and [43]. The packet ingress path starts with a packet arriving at the Network Interface Card (NIC), depicted as (1) in the figure.…”

Section: Software Packet Processing Overviewmentioning

confidence: 99%

Performance Evaluation of Next-Generation Data Plane Architectures and their Components

Geissler

Hoßfeld

2023

NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium

View full text Add to dashboard Cite

Section: Software Packet Processing Overviewmentioning

confidence: 99%

Performance Evaluation of Next-Generation Data Plane Architectures and their Components

Geissler

Hoßfeld

2023

NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium

View full text Add to dashboard Cite

“…The available performance counters and public data sheets suggest that Intel processors use a single IOTLB for all levels of mapping ( e.g ., first-level, second-level, nested, and pass-through mappings) ( Intel, 2021 ), whereas AMD processors use two distinct IOTLBs for caching Page Directory Entry (PDE) and Page Table Entry (PTE) ( AMD, 2021 ; Kegel et al, 2016 ). Additionally, Emmerich et al (2019) and Neugebauer et al (2018) have speculated (based on their experiments) that the number of IOTLB entries for some Intel processors is 64 and the cost of an IOTLB miss & its subsequent page walk is around 330 ns. Furthermore, some PCIe devices support Address Translation Service (ATS) ( Krause, Hummel & Wooten, 2006 ; PCI-SIG, 2009 ) that enables devices to cache the address translation in a local cache to minimize latency and provide a scalable distributed caching solution for IOMMU.…”

Section: Introductionmentioning

confidence: 99%

Overcoming the IOTLB wall for multi-100-Gbps Linux-based networking

Farshin

Rizzo

Elmeleegy

et al. 2023

PeerJ Computer Science

View full text Add to dashboard Cite

This article explores opportunities to mitigate the performance impact of IOMMU on high-speed network traffic, as used in the Linux kernel. We first characterize IOTLB behavior and its effects on recent Intel Xeon Scalable & AMD EPYC processors at 200 Gbps, by analyzing the impact of different factors contributing to IOTLB misses and causing throughput drop (up to 20% compared to the no-IOMMU case in our experiments). Secondly, we discuss and analyze possible mitigations, including proposals and evaluation of a practical hugepage-aware memory allocator for the network device drivers to employ hugepage IOTLB entries in the Linux kernel. Our evaluation shows that using hugepage-backed buffers can completely recover the throughput drop introduced by IOMMU. Moreover, we formulate a set of guidelines that enable network developers to tune their systems to avoid the “IOTLB wall”, i.e., the point where excessive IOTLB misses cause throughput drop. Our takeaways signify the importance of having a call to arms to rethink Linux-based I/O management at higher data rates.

show abstract