Exploiting GPUs on distributed infrastructures for medical imaging applications with VIP and DIRAC

Camarasu-Pop, Sorina; Lartizien, Carole; Grenier, Thomas; Bonnet, Axel; Wassong, Pascal; Hamar, Vanessa; Hernandez, Fabio; Arrabito, L.; Bregeon, J.; Gay, Pierre; Tsaregorodtsev, A.

doi:10.23919/mipro.2019.8757075

Cited by 3 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [12], authors introduced a scalable intuitive deep learning toolkit called R2D2 for medical image segmentation by offering novel distributed versions of two well-known and widely used CNN segmentation architectures. [13] described a system that leverages Virtual Imaging Platform (VIP) and Distributed Infrastructure with Remote Agent Control (DIRAC) enabling researchers to use distributed computing resources with GPUs for their specific medical imaging applications. We use the open-source distributed training platform called Horovod [14] to conduct experiments in a distributed-GPU setup that allows for a larger effective batch size of images, and varying learning rates.…”

Section: Related Workmentioning

confidence: 99%

Data-Efficient Training of High-Resolution Images in Medical Domain

Kunde¹,

Pandit²,

Mahajan³

et al. 2021

ESANN 2021 Proceedings

View full text Add to dashboard Cite

The ability of Graphical Processor Units (GPUs) to quickly train dataand compute-intensive deep networks has led to rapid advancements across diverse domains such as robotics, medical imaging and autonomous driving. However, memory constraints with GPU-based training for memory-intensive deep networks have forced researchers to adopt various workarounds: 1) resize the input image, 2) divide input image into smaller patches, or use smaller batch-sizes in order to fit both the model and batch training data into GPU memory.While these alternatives perform well when dealing with natural images, they suffer from 1) loss of highresolution information, 2) loss of global context and 3) sub-optimal batch sizes. Such issues will likely to become more pressing for domains like medical imaging, where data is scarce and images are often of very high resolution with subtle features. Therefore, in this paper, we demonstrate that training can be made more data-efficient by using a distributed training setup with high-resolution images and larger effective batch sizes, with batches being distributed across multiple nodes. The distributed GPU training framework, which partitions the data and only shares model parameters across different GPUs, gets around the memory constraints of single GPU training. We conduct a study in which experiments are performed for different image resolutions (ranging from 112 × 112 to 1024 × 1024) and different number of images per class to determine the effect of image resolutions on network performance. We illustrate our findings on two medical imaging datasets namely, SD-198 skin-lesion and NIH Chest X-rays.

show abstract

Section: Related Workmentioning

confidence: 99%

Data-Efficient Training of High-Resolution Images in Medical Domain

Kunde¹,

Pandit²,

Mahajan³

et al. 2021

ESANN 2021 Proceedings

View full text Add to dashboard Cite

show abstract

“…It builds a layer between users and resources, hiding diversities across computing, storage, catalog, and queuing resources. DIRAC has been adopted by several HEP and non-HEP experiment communities [18], with different goals, intents, resources and workflows: it is experiment agnostic, extensible, and flexible [19]. LHCb uses DIRAC for managing all its distributed computing activities.…”

Section: The Dirac Projectmentioning

confidence: 99%

Integrating LHCb workflows on HPC resources: status and strategies

Stagni

Valassi²,

Romanovskiy³

2020

EPJ Web Conf.

View full text Add to dashboard Cite

High Performance Computing (HPC) supercomputers are expected to play an increasingly important role in HEP computing in the coming years. While HPC resources are not necessarily the optimal fit for HEP workflows, computing time at HPC centers on an opportunistic basis has already been available to the LHC experiments for some time, and it is also possible that part of the pledged computing resources will be offered as CPU time allocations at HPC centers in the future. The integration of the experiment workflows to make the most efficient use of HPC resources is therefore essential. This paper describes the work that has been necessary to integrate LHCb workflows at a specific HPC site, the Marconi-A2 system at CINECA in Italy, where LHCb benefited from a joint PRACE (Partnership for Advanced Computing in Europe) allocation with the other Large Hadron Collider (LHC) experiments. This has required addressing two types of challenges: on the software application workloads, for optimising their performance on a many-core hardware architecture that differs significantly from those traditionally used in WLCG (Worldwide LHC Computing Grid), by reducing memory footprint using a multi-process approach; and in the distributed computing area, for submitting these workloads using more than one logical processor per job, which had never been done yet in LHCb.

show abstract

“…It builds a layer between users and resources, hiding diversities across computing, storage, catalog, and queuing resources. DIRAC has been adopted by several HEP and non-HEP experiments' communities [2], with different goals, intents, resources and workflows: it is experiment agnostic, extensible, and flexible [3]. A single DIRAC service can provide a complete solution for the distributed computing of one, or multiple collaborations.…”

Section: Introductionmentioning

confidence: 99%

DIRAC current, upcoming and planned capabilities and technologies

Stagni,

Boyer,

Tsaregorodtsev

et al. 2024

EPJ Web of Conf.

Self Cite

View full text Add to dashboard Cite

DIRAC is the interware for building and operating large scale distributed computing systems. It is adopted by multiple collaborations from various scientific domains for implementing their computing models. DIRAC provides a framework and a rich set of ready-to-use services for Workload, Data and Production Management tasks of small, medium and large scientific communities having different computing requirements. The base functionality can be easily extended by custom components supporting community specific workflows. DIRAC is at the same time an aging project, and a new DiracX project is taking shape for replacing DIRAC in the long term. This contribution will highlight DIRAC’s current, upcoming and planned capabilities and technologies, and how the transition to DiracX will take place. Examples include, but are not limited to, adoption of security tokens and interactions with Identity Provider services, integration of Clouds and High Performance Computers, interface with Rucio, improved monitoring and deployment procedures.

show abstract

Exploiting GPUs on distributed infrastructures for medical imaging applications with VIP and DIRAC

Cited by 3 publications

References 8 publications

Data-Efficient Training of High-Resolution Images in Medical Domain

Data-Efficient Training of High-Resolution Images in Medical Domain

Integrating LHCb workflows on HPC resources: status and strategies

DIRAC current, upcoming and planned capabilities and technologies

Contact Info

Product

Resources

About