2019
DOI: 10.48550/arxiv.1904.10631
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Low-Memory Neural Network Training: A Technical Report

Abstract: Memory is increasingly often the bottleneck when training neural network models. Despite this, techniques to lower the overall memory requirements of training have been less widely studied compared to the extensive literature on reducing the memory requirements of inference. In this paper we study a fundamental question: How much memory is actually needed to train a neural network? To answer this question, we profile the overall memory usage of training on two representative deep learning benchmarks -the WideR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 37 publications
0
16
0
Order By: Relevance
“…However, since convolutional layers are computationally more expensive than fully-connected layers (i.e., our target to improve in this work) as analyzed in [17] and the real bottleneck of ondevice training is memory bound as in Section 4, analyzing and improving computationally expensive convolutional layers can be a potential future direction. Our finding, memory bottleneck issue, also suggests investigating how to enable ondevice training with memory optimization in terms of model, optimizer, and the activation [18].…”
Section: Profiling Compute and Memory Operations For Trainingmentioning
confidence: 76%
“…However, since convolutional layers are computationally more expensive than fully-connected layers (i.e., our target to improve in this work) as analyzed in [17] and the real bottleneck of ondevice training is memory bound as in Section 4, analyzing and improving computationally expensive convolutional layers can be a potential future direction. Our finding, memory bottleneck issue, also suggests investigating how to enable ondevice training with memory optimization in terms of model, optimizer, and the activation [18].…”
Section: Profiling Compute and Memory Operations For Trainingmentioning
confidence: 76%
“…In order to quantify the potential gains from approximation, we conducted a variable representation and lifetime analysis of Algorithm 1 following the approach taken by Sohoni et al (2019). Table 2 lists the properties of all variables in Algorithm 1, with each variable's contributions to the total footprint shown for a representative example.…”
Section: Variable Analysismentioning
confidence: 99%
“…Despite featuring binary forward propagation, existing BNN training approaches perform backward propagation using high-precision floating-point data types-typically float32-often making training infeasible on resourceconstrained devices. The high-precision activations used between forward and backward propagation commonly constitute the largest proportion of the total memory footprint of a training run (Sohoni et al, 2019;Cai et al, 2020). Moreover, backward propagation with high-precision gradients is costly, challenging the energy limitations of edge platforms.…”
Section: Introductionmentioning
confidence: 99%
“…By constraining the trainable parameters, such as the weights, to be updated only by local variables (the information contained in the neurons that share the same parameter), we can reduce the memory requirements to load a model in hardware such as CPUs and GPUs. This constraint can save memory resources and has many potential applications, from low-memory devices 7,8 to train large batch sizes 9,10 , and, even further, to train very large neural networks 11 .…”
Section: Introductionmentioning
confidence: 99%