Interrupt processing can be a major bottleneck in the end-to-end performance of Gigabit networks. The performance of Gigabit network end hosts or servers can be severely degraded due to interrupt overhead caused by heavy incoming traffic. In particular, excessive latency and significant degradation in system throughput can be encountered. Also, user applications may livelock as the CPU power gets mostly consumed by interrupt handling and protocol processing. A number of interrupt handling schemes has been proposed and employed to mitigate the interrupt overhead and improve OS performance. Among the most popular interrupt handling schemes are normal interruption, polling, interrupt coalescing, and disabling and enabling of interrupts. In previous work, we presented a preliminary analytical study and models of normal interruption and interrupt coalescing. In this article, we extend our analysis and modeling to include polling and the scheme of interrupt disabling and enabling. For polling, we study both pure (or FreeBSD-style) polling and Linux NAPI polling.The performances for all these schemes are compared using both mathematical analysis and discrete-event simulation. The performance is studied in terms of three key performance indictors: throughput, system latency, and the residual CPU bandwidth available for user applications. As opposed to our previous work, we consider not only Poisson traffic, but also bursty traffic with empirical packet size distribution. Our analysis and simulation work gives insight into predicting the system performance and behavior when employing a certain interrupt handling scheme. It is concluded that no single interrupt handling scheme outperforms all other schemes under all traffic conditions. Based on obtained results, we propose and discuss a novel hybrid scheme of interrupt disabling-enabling and pure polling in order to attain peak performance under low and heavy traffic loads.
In the cloud, ensuring proper elasticity for hosted applications and services is a challenging problem and far from being solved. To achieve proper elasticity, the minimal number of cloud resources that are needed to satisfy a particular service level objective (SLO) requirement has to be determined. In this paper, we present an analytical model based on Markov chains to predict the number of cloud instances or virtual machines (VMs) needed to satisfy a given SLO performance requirement such as response time, throughput, or request loss probability. For the estimation of these SLO performance metrics, our analytical model takes the offered workload, the number of VM instances as an input, and the capacity of each VM instance. The correctness of the model has been verified using discrete-event simulation. Our model has also been validated using experimental measurements conducted on the Amazon Web Services cloud platform.
SUMMARYThe paper presents analytical and simulation models to study the impact of interrupt overhead on operating system throughput of network elements such as PC-based routers, servers, and hosts when subjected to high-speed network traffic. Under such high network traffic, the system throughput will be negatively affected due to interrupt overhead caused by the incoming traffic. We first present an analytical model for the ideal system when interrupt overhead is ignored. We then present two models which describe the impact of high interrupt rate on system throughput. One model is for employing PIO in which network adapters are not equipped with DMA engines, and the other model is for employing DMA in which network adapters are equipped with DMA engines. The paper also describes detailed discrete-event simulation models for the ideal system and for systems with DMA and PIO. Simulations results as well as reported experimental measurements show that our analytical models are valid and give a good approximation. Our analysis and simulation work can be valuable in providing insight to understand and predict system behaviour, as well as improving and maintaining good host performance. The paper identifies analytically critical design operation points such as that of overload condition. The paper also proposes solutions and recommendations for improving performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.