Yuping Fan scite author profile

High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute-and dataintensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BB-Sched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling.

show abstract

Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne

Allcock

Rich

Fan

et al. 2018

View full text Add to dashboard Cite

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

Qiao

Wang

et al. 2018

View full text Add to dashboard Cite

Among the high-radix and low-diameter networks, fat-tree topology is commonly used in high-performance computing (HPC) and datacenter systems. Resource and job management on HPC systems is critically important to mitigate application interference in order to achieve high system performance and utilization. Preliminary studies have shown the effect of job placement on parallel scientific applications performance in fat-tree network. In this work we explore the joint effects of job placement and network routing aware of applications communication pattern on fat-tree system. Applications can be classified into various groups according to the communication patterns. We further combine various job placement policies and routing algorithms and create six different configurations. The system performance is analyzed using communication, hops, traffic, and saturation data by performing fine-grained highfidelity discrete event-driven simulation. Initial experimentation shows that the performance of HPC applications not only is related with the communication pattern, but also relies on the job placement and network routing on fat-tree systems.

show abstract

Preliminary Interference Study About Job Placement and Routing Algorithms in the Fat-Tree Topology for HPC Applications

Qiao

Wang

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuping Fan

Trade-Off Between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates

Scheduling Beyond CPUs for HPC

Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

Preliminary Interference Study About Job Placement and Routing Algorithms in the Fat-Tree Topology for HPC Applications

Contact Info

Product

Resources

About