2019
DOI: 10.1051/epjconf/201921403056
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Scheduling Efficiency of a Global Multi-Core HTCondor Pool in CMS

Abstract: Scheduling multi-core workflows in a global HTCondor pool is a multi-dimensional problem whose solution depends on the requirements of the job payloads, the characteristics of available resources, and the boundary conditions such as fair share and prioritization imposed on the job matching to resources. Within the context of a dedicated task force, CMS has increased significantly the scheduling efficiency of workflows in reusable multi-core pilots by various improvements to the limitations of the GlideinWMS pi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 6 publications
0
2
0
Order By: Relevance
“…The Global Pool matches diverse CMS workloads, which include single-core as well as multicore requests [378] to these resources. A successful scheduling is achieved by simultaneously ensuring that all available resources are efficiently used [379], a fair share of resources between users is reached, and the completion of CMS tasks follows their prioritization, minimizing job failures and manual intervention. While the SI typically manages 100k to 150k simultaneously executing tasks, recent scalability tests [380] have demonstrated the capacity of the infrastructure to sustain in excess of half a million concurrently running jobs.…”
Section: Central Processing and Productionmentioning
confidence: 99%
“…The Global Pool matches diverse CMS workloads, which include single-core as well as multicore requests [378] to these resources. A successful scheduling is achieved by simultaneously ensuring that all available resources are efficiently used [379], a fair share of resources between users is reached, and the completion of CMS tasks follows their prioritization, minimizing job failures and manual intervention. While the SI typically manages 100k to 150k simultaneously executing tasks, recent scalability tests [380] have demonstrated the capacity of the infrastructure to sustain in excess of half a million concurrently running jobs.…”
Section: Central Processing and Productionmentioning
confidence: 99%
“…Similarly, one can find examples of workload abstraction for HTCondor in HEP as well. Collaborations such as CMS and ATLAS use higher-level concepts such as pilot jobs to steer the work of batch pools across their sites world-wide [96,97]. Sometimes these tools go even further and provide programmatic interfaces to help users in organising all the files needed for an analysis, as is the case with the ATLAS PanDA and the LHCb GANGA systems [97,98].…”
Section: State Of the Artmentioning
confidence: 99%