2018
DOI: 10.48550/arxiv.1810.00307
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mini-batch Serialization: CNN Training with Inter-layer Data Reuse

Abstract: Training convolutional neural networks (CNNs) requires intense computations and high memory bandwidth. We find that bandwidth today is over-provisioned because most memory accesses in CNN training can be eliminated by rearranging computation to better utilize on-chip buffers and avoid traffic resulting from large per-layer memory footprints. We introduce the MBS CNN training approach that significantly reduces memory traffic by partially serializing mini-batch processing across groups of layers. This optimizes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…Eyeriss [50], DaDiannao [162], Tetris [131], and Minerva [163]). WaveCore [164] and Google's TPUv2 [97] support CNN training, but suffer from challenges highlighted in Section 3. EcoFlow solves these issues, while introducing minimal changes to the CNN inference accelerator architecture.…”
Section: Related Workmentioning
confidence: 99%
“…Eyeriss [50], DaDiannao [162], Tetris [131], and Minerva [163]). WaveCore [164] and Google's TPUv2 [97] support CNN training, but suffer from challenges highlighted in Section 3. EcoFlow solves these issues, while introducing minimal changes to the CNN inference accelerator architecture.…”
Section: Related Workmentioning
confidence: 99%
“…This is mainly because the inputs of each layer at forward propagation should be kept in memory and reused to compute the local gradients in back-propagation. In particular, the total size of all layer inputs linearly increases with mini-batch size [27]. Therefore, small off-chip memory capacity or a large feature size of a CNN can constrain the mini-batch size per accelerator, and hence also the data parallelism of each layer.…”
Section: Cnn Model Trainingmentioning
confidence: 99%