2018
DOI: 10.1371/journal.pcbi.1006468
|View full text |Cite
|
Sign up to set email alerts
|

Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor

Abstract: Biologists and environmental scientists now routinely solve computational problems that were unimaginable a generation ago. Examples include processing geospatial data, analyzing -omics data, and running large-scale simulations. Conventional desktop computing cannot handle these tasks when they are large, and high-performance computing is not always available nor the most appropriate solution for all computationally intense problems. High-throughput computing (HTC) is one method for handling computationally in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…The entire SINGE workflow for all hyperparameters and replicates requires 1,219.4 h. However, Jump3 and SINGE are highly parallelizable. We deployed them on our local high-throughput computing cluster using HTCondor (Erickson et al, 2018), which connects to the OSG (Pordes et al, 2007). In this highthroughput setting, we can run the entire SINGE algorithm in 36 h and the Jump3 algorithm in 72 h. SINGE can also be configured to run on a single workstation with appropriate changes to the hyperparameters.…”
Section: Computational Runtimementioning
confidence: 99%
“…The entire SINGE workflow for all hyperparameters and replicates requires 1,219.4 h. However, Jump3 and SINGE are highly parallelizable. We deployed them on our local high-throughput computing cluster using HTCondor (Erickson et al, 2018), which connects to the OSG (Pordes et al, 2007). In this highthroughput setting, we can run the entire SINGE algorithm in 36 h and the Jump3 algorithm in 72 h. SINGE can also be configured to run on a single workstation with appropriate changes to the hyperparameters.…”
Section: Computational Runtimementioning
confidence: 99%
“…HPC works best with large computer tasks that are often broken down into tightly linked smaller tasks. In contrast, HTC works best with large computer tasks that can be broken down into many smaller tasks and run independently (Erickson et al, 2018). Historically, HPC required large, onsite computers, such as the super‐computers, found on major research universities or national laboratories.…”
Section: Conceptsmentioning
confidence: 99%
“…High Throughput Computing (HTC) represents a method for handling this computationally intense task, which let the users split a big task into many small independent ones (a.k.a. jobs) that are distributed across a computer cluster [9].…”
Section: Workflow Implementationmentioning
confidence: 99%