Abstract-Intrusion detection relies on the ability to obtain reliable and trustworthy measurements, while adversaries will inevitably target such monitoring and security systems to prevent their detection. This has led to a number of proposals for using coprocessors as protected monitoring instances. However, such coprocessors suffer from two problems, namely the ability to perform measurements without relying on the host system and the speed at which such measurements can be performed. The availability of smart, high-performance subsystems in commodity computer systems such as graphics processing units (GPU) strongly motivates an investigation into novel ways of achieving the twin objectives of self-protected observation and monitoring systems and sufficient measurement frequency. This, however, gives rise to performance penalties imposed by memory synchronization particularly in non-uniform memory architectures (NUMA) even for the case of direct memory access (DMA) transfers. Based on prior work detailing a cost model for synchronization of memory access in such advanced architectures, we report an experimental validation of the cost model using an IEEE 1394 DMA bus mastering environment, which provides full access to the measurement target's main memory and involves multiple bus bridges and concomitant synchronization mechanisms. We observed up to 25% performance degradation, highlighting the need for efficient sampling strategies for both, memory size and a preference for quiescent data structures for monitoring executed by off-host devices.