Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interference between interactive and batch processing workloads currently prevents this from being a practical solution in virtualized environments. Further, the variable nature of spare capacity may make it difficult to meet big data application deadlines. In this work we propose two schedulers: one in the virtualization layer designed to minimize interference on high priority interactive services, and one in the Hadoop framework that helps batch processing jobs meet their own performance deadlines. Our approach uses performance models to match Hadoop tasks to the servers that will benefit them the most, and deadline-aware scheduling to effectively order incoming jobs. We use admission control to meet deadlines even when resources are overloaded. The combination of these schedulers allows data center administrators to safely mix resource intensive Hadoop jobs with latency sensitive web applications, and still achieve predictable performance for both. We have implemented our system using Xen and Hadoop, and our evaluation shows that our schedulers allow a mixed cluster to reduce web response times by more than ten fold compared to the existing Xen Credit Scheduler, while meeting more Hadoop deadlines and lowering total task execution times by 6.5%.
Every physical machine in today's typical datacenter is backed by storage devices with hundreds of Gigabytes to Terabytes in size. Data center vendors usually use hard disk drives for their back-end storage as it is cheap and reliable. However, the increase in the I/O accesses to the back-end storage from one or many of the VMs hosted on a physical machine can reduce its overall accesses time significantly due to contention. This may not be suitable for interactive applications requiring low latency that might be co-located with other I/O intensive applications.In this paper we present Multi-Cache, a multi-layer cache management system that uses a combination of cache devices of varied speed and cost such as solid state drives, non-volatile memories, etc to mitigate this problem. Multi-Cache partitions each device dynamically at runtime according to the workload of each VM and its priority. We use a heuristic optimization technique that ensures maximum utilization of the caches resulting in a high hit rate.We use a weighted partitioning policy that improves latency by up to 72% for individual workloads, and a overall hit rate increase of up to 31% for host running several workloads together in comparison to standard LRU caching algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.