2018
DOI: 10.1007/978-3-319-77398-8_1
|View full text |Cite
|
Sign up to set email alerts
|

Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
12
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(14 citation statements)
references
References 5 publications
0
12
0
2
Order By: Relevance
“…A good solution is supposed to achieve good scheduling performance, such as low average job wait time and high system utilization, with the minimum scheduling overhead. Typical HPC systems tolerate 10-30 s of scheduling delay [17,18]. It is crucial to evaluate the performance and overhead of new RL scheduling methods before deployment.…”
Section: Impactmentioning
confidence: 99%
“…A good solution is supposed to achieve good scheduling performance, such as low average job wait time and high system utilization, with the minimum scheduling overhead. Typical HPC systems tolerate 10-30 s of scheduling delay [17,18]. It is crucial to evaluate the performance and overhead of new RL scheduling methods before deployment.…”
Section: Impactmentioning
confidence: 99%
“…However, fairness and utilization are conflicting goals and aggressively using fair sharing can hurt cluster utilization [19]. In contrast, HPC systems are designed to run large jobs and prefer users running large jobs [10]. Therefore, fair sharing is not a concern in HPC scheduling.…”
Section: Multi-resource Schedulingmentioning
confidence: 99%
“…In the past, a number of scheduling policies have been proposed, and one of the widely used policies is FCFS, which sorts the jobs in the order of their arrivals. At ALCF, to support the mission of running large-scale capability jobs, a utility-based scheduling policy, named WFP, is deployed which periodically calculates a priority increment for each waiting job [10,42]. EASY backfilling is a commonly used strategy to enhance system utilization, where subsequent jobs are allowed to skip ahead under the condition that they do not delay the job at the head of the queue [30].…”
Section: Background and Related Work 21 Hpc Schedulingmentioning
confidence: 99%
“…В свою очередь, отделение Аргоннской национальной лаборатории Argonne Leadership Computing Facility (ALCF) [23] ставит перед собой следующую цель эксплуатации своих систем: утилизация вычислительного времени системы является важной целью в ALCF, но самая большая цель это позволить заданиям экстремального масштаба максимально быстро запускаться [24]. С 2013 по 2017 г. суперкомпьютер "Mira" дал доступ к своим ресурсам более 550 проектам и 1000 пользователям, более 280 000 задач было запущено на этой системе.…”
unclassified
“…С 2013 по 2017 г. суперкомпьютер "Mira" дал доступ к своим ресурсам более 550 проектам и 1000 пользователям, более 280 000 задач было запущено на этой системе. Для решения своей амбициозной цели ALCF отошли от традиционного подхода к планированию ресурсов кластера, они придумали свою "функцию утилизации" и реализовали специальную систему очередей, которая позволяет максимально эффективно решать поставленную задачу [24].…”
unclassified