We presents a unique advanced job scheduler for the widely used TORQUE Resource Manager. Unlike common schedulers that are using queuing approach and heuristics, our solution uses planning (job schedule construction) and schedule optimization by a local search-inspired metaheuristic, achieving better predictability, performance and fairness with respect to common queue-based approaches. The suitability and good performance of our solution is demonstrated both by "synthetic" experiments as well as by our real-life performance results that are coming from the deployment of our scheduler in the production infrastructure of the Czech Centre for Education, Reasearch and Innovation in ICT (CERIT Scientific Cloud).
In this work we present a major extension of the open source TORQUE Resource Manager system. We have replaced a naive scheduler provided in the TORQUE distribution with complex scheduling system that allows to plan job execution ahead and predict the behavior of the system. It is based on the application of job schedule, which represents the jobs' execution plan. Such a functionality is very useful as the plan can be used by the users to see when and where their jobs will be executed. Moreover, created plans can be easily evaluated in order to identify possible inefficiencies. Then, repair actions can be taken immediately and the inefficiencies can be fixed, producing better schedules with respect to considered criteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.