Abstract-A long-standing challenge in cluster scheduling is to achieve a high degree of utilization of heterogeneous resources in a cluster. In practice there exists a substantial disparity between perceived and actual resource utilization. A scheduler might regard a cluster as fully utilized if a large resource request queue is present, but the actual resource utilization of the cluster can be in fact very low. This disparity results in the formation of idle resources, leading to inefficient resource usage and incurring high operational costs and an inability to provision services. In this paper we present a new cluster scheduling system, ROSE, that is based on a multi-layered scheduling architecture with an ability to over-subscribe idle resources to accommodate unfulfilled resource requests. ROSE books idle resources in a speculative manner: instead of waiting for resource allocation to be confirmed by the centralized scheduler, it requests intelligently to launch tasks within machines according to their suitability to oversubscribe resources. A threshold control with timely task rescheduling ensures fully-utilized cluster resources without generating potential task stragglers. Experimental results show that ROSE can almost double the average CPU utilization, from 36.37% to 65.10%, compared with a centralized scheduling scheme, and reduce the workload makespan by 30.11%, with an 8.23% disk utilization improvement over other scheduling strategies.Index Terms-cluster scheduling, resource management, oversubscription
Striking a balance between improved cluster utilization and guaranteed application QoS is a long-standing research problem in cluster resource management. The majority of current solutions require a large number of sandboxed experimentation for different workload combinations and leverage them to predict possible interference for incoming workloads. This results in nonnegligible time complexity that severely restricts its applicability to complex workload co-locations. The nature of pure offline profiling may also lead to model aging problem that drastically degrades the model precision. In this paper, we present Perph, a runtime agent on a per node basis, which decouples ML-based performance prediction and resource inference from centralized scheduler. We exploit the sensitivity of long-running applications to multi-resources for establishing a relationship between resource allocation and consequential performance. We use Online Gradient Boost Regression Tree (OGBRT) to enable the continuous model evolution. Once performance degradation is detected, resource inference is conducted to work out a proper slice of resources that will be reallocated to recover the target performance. The integration with Node Manager (NM) of Apache YARN shows that the throughput of Kafka data-streaming application is 2.0x and 1.82x times that of isolation execution schemes in native YARN and pure cgroup cpu subsystem. In TPC-C benchmarking, the throughput can also be improved by 35% and 23% respectively against YARN native and cgroup cpu subsystem.
To achieve a high degree of resource utilization, production clusters need to co-schedule diverse workloads -including both batch analytic jobs with short-lived tasks and long-running applications (LRAs) that execute for a long time frame from hours to months -onto the shared resources. Microservice architecture advances the manifestation of distributed LRAs (DLRAs), comprising multiple interconnected microservices that are executed in long-lived distributed containers and serve massive user requests. Detecting and mitigating QoS violation become even more intractable due to the network uncertainties and latency propagation across dependent microservices. However, current resource managers are only responsible for resource allocation among applications/jobs but agnostic to runtime QoS such as latency at application level. The state-of-the-art QoS-aware scheduling approaches are dedicated for monolithic applications, without considering the temporal-spatio performance variability across distributed microservices. In this paper, we present TOPOSCH, a new scheduling and execution framework to prioritize the QoS of DLRAs whilst balancing the performance of batch jobs and maintaining high cluster utilization through harvesting idle resources. TOPOSCH tracks footprints of every single request across microservices and uses critical path analysis, based on the end-to-end latency graph, to identify microservices that have high risk of QoS violation. Based on microservice and node level risk assessment, we intervene the batch scheduling by adaptively reducing the visible resources to batch tasks and thus delaying their execution to give way to DLRAs. We propose a prediction-based vertical resource auto-scaling mechanism, with the aid of resource-performance modeling and fine-grained resource inference and access control, for prompt recovery of QoS violation. A cost-effective task preemption is leveraged to ensure a low-cost task preemption and resource reclamation during the auto-scaling. TOPOSCH is integrated with Apache YARN and experiments show that TOPOSCH outperforms other baselines in terms of performance guarantee of DLRAs, at an acceptable cost of batch job slowdown. The tail latency of DLRAs is merely 1.12x of the case of executing alone on average in TOPOSCH with a 26% JCT increase of Spark analytic jobs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.