A container-based workflow for distributed training of deep learning algorithms in HPC clusters

González-Abad, Jose; García, Álvaro López; Козлов, В.

doi:10.1007/s10586-022-03798-7

Cited by 8 publications

(2 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this nodes using one GPU, CNN-DeepESD training takes about 2 seconds per epoch, CNN-PAN 3 seconds and CNN-UNET 50 seconds. To ease the reproducibility of these experiments, as well as to simplify their execution on HPC clusters, we provide the required scripts to follow the workflow presented in (González-Abad, López García, & Kozlov, 2022). Dockerfiles are also available in the GitHub repository.…”

Section: Data and Code Availability Statementmentioning

confidence: 99%

Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

González-Abad¹,

Baño‐Medina²,

Gutiérrez³

2023

Preprint

View full text Add to dashboard Cite

show abstract

Section: Data and Code Availability Statementmentioning

confidence: 99%

Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

González-Abad¹,

Baño‐Medina²,

Gutiérrez³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…We train the models in nodes equipped with graphical processing units (GPUs), more specifically NVIDIA Tesla V100 GPUs. In this nodes using one GPU, CNN‐DeepESD training takes about 2 s per epoch, CNN‐PAN 3 s and CNN‐UNET 50 s. To ease the reproducibility of these experiments, as well as to simplify their execution on HPC clusters, we provide the required scripts to follow the workflow presented in González‐Abad, López García, and Kozlov (2022)). Dockerfiles are also included with the code.…”

Section: Data Availability Statementmentioning

confidence: 99%

Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

González‐Abad,

Baño‐Medina,

Gutiérrez

2023

J Adv Model Earth Syst

Self Cite

View full text Add to dashboard Cite

Deep learning (DL) has emerged as a promising tool to downscale climate projections at regional‐to‐local scales from large‐scale atmospheric fields following the perfect‐prognosis approach. Given their complexity, it is crucial to properly evaluate these methods, especially when applied to changing climatic conditions where the ability to extrapolate/generalize is key. In this work, we intercompare several DL models extracted from the literature for the same challenging use‐case (downscaling temperature in the CORDEX North America domain) and expand standard evaluation methods building on eXplainable Artificial Intelligence (XAI) techniques. Specifically, we introduce two novel XAI‐based diagnostics—Aggregated Saliency Map and Saliency Dispersion Maps—and show how they can be used to unravel the internal behavior of these models, aiding in their design and evaluation. This work advocates for the introduction of XAI techniques into deep downscaling evaluation frameworks, especially when working with large regions and/or under climate change conditions.

show abstract

Parallel Programming in the Hybrid Model on the HPC Clusters

Rak

2023

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

A container-based workflow for distributed training of deep learning algorithms in HPC clusters

Cited by 8 publications

References 51 publications

Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

Parallel Programming in the Hybrid Model on the HPC Clusters

Contact Info

Product

Resources

About