2019 IEEE International Conference on Cluster Computing (CLUSTER) 2019
DOI: 10.1109/cluster.2019.8891014
|View full text |Cite
|
Sign up to set email alerts
|

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

Abstract: As HPC systems move into the Exascale era, parallel file systems are struggling to keep up with the I/O requirements from data-intensive problems. While the inclusion of burst buffers has helped to alleviate this by improving I/O performance, it has also increased the complexity of the I/O hierarchy by adding additional storage layers each with its own semantics. This forces users to explicitly manage data movement between the different storage layers, which, coupled with the lack of interfaces to communicate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Moreover, for effective data scheduling, these models could be extrapolated to different node and process configurations. While several services were proposed to support data staging [109]- [112], to the best of our knowledge, none consider these issues.…”
Section: B Data Scheduler and Managementmentioning
confidence: 99%
“…Moreover, for effective data scheduling, these models could be extrapolated to different node and process configurations. While several services were proposed to support data staging [109]- [112], to the best of our knowledge, none consider these issues.…”
Section: B Data Scheduler and Managementmentioning
confidence: 99%
“…Figure 2 shows the project's research around system software support. The research shown here was jointly undertaken by EPCC and Barcelona Supercomputing Centre, and investigated how workflows with data dependencies can be supported on a system with persistent memory [7]. The SLURM resource manager was modified to be aware of the persistent memory, and it was extended to enable users to reconfigure and reboot compute nodes with specific configurations for their jobs.…”
Section: I/o Performancementioning
confidence: 99%
“…The advantage of node local storage such as B-APM is that it is possible to leave data on the compute nodes. The NORNS [12] system software layer, which is integrated with the SLURM resource manager, supports the asynchronous staging of data, and the data marshalling between workflow components. As part of the job submission, a user can specify the data dependencies between producer-consumer style workflow components.…”
Section: Workflows With Data Dependenciesmentioning
confidence: 99%
“…The advantage of node local storage such as B-APM is that it is possible to leave data on the compute nodes. The NORNS [12] App C: Mixed mode with large memory and distributed storage using GekkoFS copy data from fsdax to GekkoFS Fig. 8.…”
Section: Workflows With Data Dependenciesmentioning
confidence: 99%