In the fusion community, the use of high performance computing (HPC) has been mostly dominated by heavy-duty plasma simulations, such as those based on particle-in-cell and gyrokinetic codes. However, there has been a growing interest in applying machine learning for knowledge discovery on top of large amounts of experimental data collected from fusion devices. In particular, deep learning models are especially hungry for accelerated hardware, such as graphics processing units (GPUs), and it is becoming more common to find those models competing for the same resources that are used by simulation codes, which can be either CPU-or GPU-bound. In this paper, we give examples of deep learning models -such as convolutional neural networks, recurrent neural networks, and variational autoencoders -that can be used for a variety of tasks, including image processing, disruption prediction, and anomaly detection on diagnostics data. In this context, we discuss how deep learning can go from using a single GPU on a single node to using multiple GPUs across multiple nodes in a large-scale HPC infrastructure.