SWARM: Scheduling Large-Scale Jobs over the Loosely-Coupled HPC Clusters

Pallickara, Shrideep; Pierce, Marlon

doi:10.1109/escience.2008.64

Cited by 8 publications

(3 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Condor-G in turn uses GRAM2 for its job submission and data staging. We have spun this work off into the Swarm project [31]. Data services in the portal include fault models from the QuakeTables fault database [32], GPS archival data from Geophysical Resources Web Service [33], and real-time data from the California Real-Time Network [34].…”

Section: Quakesimmentioning

confidence: 99%

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two

Basney

Martin

Navarro

et al. 2008

2008 IEEE Fourth International Conference on eScience

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Quakesimmentioning

confidence: 99%

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two

Basney

Martin

Navarro

et al. 2008

2008 IEEE Fourth International Conference on eScience

Self Cite

View full text Add to dashboard Cite

show abstract

“…This supercomputer log shows that idle resources are not used during this waiting time, causing inefficient resource waste. Therefore, the efficiency of resource allocation has to be evaluated by the scheduling algorithm [9]. It can be seen that optimization can be performed by applying the backfilling.…”

Section: Introductionmentioning

confidence: 99%

Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study

et al. 2020

View full text Add to dashboard Cite

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.

show abstract

“…Table 2 shows the data/computation flow of these three basic execution units, along with examples. (Raicu, Zhao et al 2007) and SWARM (Pallickara and Pierce 2008) all provide similar functionality by scheduling large numbers of individual maps/jobs. Applications which can utilize a "reduction" or an "aggregation" operation can use both phases of the MapReduce model and, depending on the "associativity" and "transitivity" nature of the reduction operation, multiple reduction phases can be applied to enhance the parallelism.…”

Section: Programming Modelsmentioning

confidence: 99%

High-Performance Parallel Computing with Cloud and Cloud Technologies

Ekanayake¹,

Qiu²,

Gunarathne³

et al. 2010

Cloud Computing and Software Services

View full text Add to dashboard Cite

We present our experiences in applying, developing, and evaluating cloud and cloud technologies. First, we present our experience in applying Hadoop and DryadLINQ to a series of data/compute intensive applications and then compare them with a novel MapReduce runtime developed by us, named CGL-MapReduce, and MPI. Preliminary applications are developed for particle physics, bioinformatics, clustering, and matrix multiplication. We identify the basic execution units of the MapReduce programming model and categorize the runtimes according to their characteristics. MPI versions of the applications are used where the contrast in performance needs to be highlighted. We discuss the application structure and their mapping to parallel architectures of different types, and look at the performance of these applications. Next, we present a performance analysis of MPI parallel applications on virtualized resources.

show abstract

SWARM: Scheduling Large-Scale Jobs over the Loosely-Coupled HPC Clusters

Cited by 8 publications

References 17 publications

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two

The Problem Solving Environments of TeraGrid, Science Gateways, and the Intersection of the Two

Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study

High-Performance Parallel Computing with Cloud and Cloud Technologies

Contact Info

Product

Resources

About