“…The available performance counters and public data sheets suggest that Intel processors use a single IOTLB for all levels of mapping ( e.g ., first-level, second-level, nested, and pass-through mappings) ( Intel, 2021 ), whereas AMD processors use two distinct IOTLBs for caching Page Directory Entry (PDE) and Page Table Entry (PTE) ( AMD, 2021 ; Kegel et al, 2016 ). Additionally, Emmerich et al (2019) and Neugebauer et al (2018) have speculated (based on their experiments) that the number of IOTLB entries for some Intel processors is 64 and the cost of an IOTLB miss & its subsequent page walk is around 330 ns. Furthermore, some PCIe devices support Address Translation Service (ATS) ( Krause, Hummel & Wooten, 2006 ; PCI-SIG, 2009 ) that enables devices to cache the address translation in a local cache to minimize latency and provide a scalable distributed caching solution for IOMMU.…”