2019
DOI: 10.1007/978-3-030-29400-7_10
|View full text |Cite
|
Sign up to set email alerts
|

Improving Fairness in a Large Scale HTC System Through Workload Analysis and Simulation

Abstract: Monitoring and analyzing the execution of a workload is at the core of the operation of data centers. It allows operators to verify that the operational objectives are satised or detect and react to any unexpected and unwanted behavior. However, the scale and complexity of large workloads composed of millions of jobs executed each month on several thousands of cores, often limit the depth of such an analysis. This may lead to overlook some phenomena that, while not harmful at a global scale, can be detrimental… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Also users and/or groups are often subject to a upper bound on the amount of resources they can use simultaneously. For this purpose, Alea provides CPU quotas, that guarantee that a user/group will not exceed the corresponding maximum allowed share of resources [2].…”
Section: Detailed System Simulation Capabilitiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Also users and/or groups are often subject to a upper bound on the amount of resources they can use simultaneously. For this purpose, Alea provides CPU quotas, that guarantee that a user/group will not exceed the corresponding maximum allowed share of resources [2].…”
Section: Detailed System Simulation Capabilitiesmentioning
confidence: 99%
“…Using Alea, we were able to model the system and evaluate new setups for the system's queues and the per-group CPU quotas. This new setup allowed for improved fairness for local users, by better balancing their wait times with the wait times of grid-originating jobs [2].…”
Section: Improving Fairness In Large Htc Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…In a previous study we showed that two distinct sub-workloads are executed at CC-IN2P3 [2]. Some jobs are submitted by a small number of large user groups through a Grid middleware, at a nearly constant rate and with an important upstream control of the submissions while Local users from about 60 dierent groups directly submit their jobs to the batch system.…”
mentioning
confidence: 99%
“…This workload is composed of 7,749,500 Grid jobs and 5,748,922 Local jobs, for a total of 13,498,422 jobs. Hereafter we focus only on the Local jobs, as they experience larger wait times than Grid jobs [2].…”
mentioning
confidence: 99%