The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardware-coherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity.This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a
synchronous home-based LRC
protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechanisms are simple and do not require programmability.We find that the performance improvements are substantial, bringing performance on a small-scale SMP cluster much closer to that of hardware-coherent shared memory for many applications, and we show the value of each of the mechanisms in different applications. Application performance improves by about 37% on average for reasonably well performing applications, even on our relatively slow programmable NI, and more for others. We discuss the key remaining bottlenecks at the protocol level and use a firmware performance monitor in the NI to understand the interactions with and the implications for the communication layer.
An important application domain for online services is interactive, multiplayer games. An essential component for realizing these services is game servers that can support large numbers of simultaneous users in a single game world. In this work, we use a popular, 3D, interactive, multiplayer game server, Quake, to study this important class of applications. We present the design and implementation of a multithreaded version of the server. We examine the challenges in scaling this class of applications to large numbers of users, mainly task decomposition and synchronization. We present preliminary performance results for a server with up to eight processors. We find that: (i) Scaling interactive, multiplayer games that exhibit fine-grain interactions in a detailed 3D world to large numbers of players is a challenging task. (ii) The main bottlenecks are lock synchronization during request processing and high wait times due to fine grain workload imbalances at global synchronization points. (iii) Significant future improvements are possible using techniques that take advantage of game-specific knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.