Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

Jin, Hai; Liu, Bo; Jiang, Wenbin; Ma, Yang; Shi, Xuanhua; He, Bingsheng; Zhao, Shaofeng

doi:10.1145/3243904

Cited by 30 publications

(34 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We do the evaluation in the following three aspects: First, the execution time potential of Layup is evaluated, comparing with some existing state-of-the-art works, including vDNN [32], Layrub [25], Caffe [24], and SuperNeurons [42]. Since SuperNeurons is the most effective approach among them and was put forward recently, we carry out a separate comparison with this technique in more detail.…”

Section: Discussionmentioning

confidence: 99%

“…AlexNet is used for this comparison. Both of these memory optimizations are evaluated based on their best performance implementations [4,25] on Caffe. As shown in Figures 2(a) and 2(b), we make the following findings: the CPU-GPU transfer outperforms the extra forward computation in the CONV1-CONV5 and FC6-FC8 layers (by an average speedup of 7.4×), but underperforms for the rest of the layers.…”

Section: Issue 1: Performance Costs Of Memory-optimized Methodsmentioning

confidence: 99%

“…These approaches use prefetching and offloading techniques to hide the communication cost. Layrub [25] shows that feature maps and gradient maps in the training process can be reused from algorithmic and architectural perspectives. The primary limitation of this work is that its inter-layer memory Layup: Memory Optimization for GPU-based CNNs 39:5 reuse strongly depends on the transfer bandwidth between CPU and GPU, which may introduce significant costs and long delays due to bandwidth contention.…”

Section: Related Workmentioning

confidence: 99%

“…Gradient map: There are already several existing methods that try to optimize the memory usage for this type of intermediate data, and the intra-layer strategy of Layrub [25] is typical of these. The gradient map reuses the memory space of the corresponding feature map.…”

Section: Multi-type Intermediate Data Reuse Strategymentioning

confidence: 99%

“…Reusing feature maps (a type of intermediate data) to save memory usage is a classic method that has been studied in many existing works. vDNN [32], GeePS [11], and Layrub [25] utilize CPU memory as a limitless assistant memory for the temporary storage of intermediate data. By using a CPU-GPU transfer, the memory usage in the GPU can be significantly decreased.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Layup

Jiang

Liu

et al. 2019

ACM Trans. Archit. Code Optim.

Self Cite

View full text Add to dashboard Cite

Although GPUs have emerged as the mainstream for the acceleration of convolutional neural network (CNN) training processes, they usually have limited physical memory, meaning that it is hard to train large-scale CNN models. Many methods for memory optimization have been proposed to decrease the memory consumption of CNNs and to mitigate the increasing scale of these networks; however, this optimization comes at the cost of an obvious drop in time performance. We propose a new memory optimization strategy named Layup that realizes both better memory efficiency and better time performance. First, a fast layer-type-specific method for memory optimization is presented, based on the new finding that a single memory optimization often shows dramatic differences in time performance for different types of layers. Second, a new memory reuse method is presented in which greater attention is paid to multi-type intermediate data such as convolutional workspaces and cuDNN handle data. Experiments show that Layup can significantly increase the scale of extra-deep network models on a single GPU with lower performance loss. It even can train ResNet with 2,504 layers using 12GB memory, outperforming the state-of-the-art work of SuperNeurons with 1,920 layers (batch size = 16).

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Issue 1: Performance Costs Of Memory-optimized Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Multi-type Intermediate Data Reuse Strategymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations