2020
DOI: 10.1109/tpds.2020.2970013
|View full text |Cite
|
Sign up to set email alerts
|

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

Abstract: It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this paper we present ROSE, a new resource management platform capable of conducting performance-aware resource … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

5
3

Authors

Journals

citations
Cited by 23 publications
(28 citation statements)
references
References 31 publications
0
28
0
Order By: Relevance
“…Monitoring. Monitoring is the key to application aware optimization [10], [26], [53], [62], [17], [63]. In order to obtain a fine-grained view of the infrastructure, Horus leverages cAdvisor 7 , a container monitoring framework.…”
Section: System Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…Monitoring. Monitoring is the key to application aware optimization [10], [26], [53], [62], [17], [63]. In order to obtain a fine-grained view of the infrastructure, Horus leverages cAdvisor 7 , a container monitoring framework.…”
Section: System Implementationmentioning
confidence: 99%
“…Understanding and achieving high resource utilization for heterogeneous workloads-including DL-in cloud computing is an important topic [30], [28], [21], [22], [14], [62], [8], [6], [17], [18], [10]. GPU profiling.…”
Section: Related Workmentioning
confidence: 99%
“…drones have proliferated recently and widely adopted in numerous industrial or commercial areas such as weather observation [1], disaster management [2], agricultural irrigation [3], etc. The advancement of such applications is mainly propelled by diverse deep neural networks models [4], [5], [6], [7] and massive-scale high performance computing [8], [9]. While promising, security and privacy issues become the main concerns in the traffic management for the safe presence of UAVs in the airspace [10], [11].…”
Section: Introductionmentioning
confidence: 99%
“…However, they are not innately designed to consider intercluster (cluster-to-cluster) performance when enacting workload placement and execution decisions. This is problematic as clusters leveraged for cloud computing are exposed to network volatility [3], dynamic utilization [30], and heterogeneous scheduling architectures [7] -all which are intrinsic to federated cluster environments. The majority of federated orchestration systems only consider resource demand and reservation [5], [28], and omit characteristics at network-level (bandwidth, latency), node-level (interference, contention) and cluster-level (scheduler type).…”
Section: Introductionmentioning
confidence: 99%