Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Syste 2020
DOI: 10.1145/3373376.3378465
|View full text |Cite
|
Sign up to set email alerts
|

AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming

Abstract: Memory capacity is a key bottleneck for training large scale neural networks. Intel® Optane™ DC PMM (persistent memory modules) which are available as NVDIMMs are a disruptive technology that promises significantly higher read bandwidth than traditional SSDs at a lower cost per bit than traditional DRAM. In this work we show how to take advantage of this new memory technology to minimize the amount of DRAM required without compromising performance significantly. Specifically, we take advantage of the static na… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(22 citation statements)
references
References 32 publications
0
22
0
Order By: Relevance
“…A straightforward method to reduce the DRAM footprint is to asynchronously flush cached regions into NVM and reclaim them in advance. However, prior work [20] has reported negative results on this method. Although asynchronous flushing can reduce DRAM consumption, it introduces NVM write operations, which may lead to less NVM bandwidth and worse GC performance.…”
Section: Asynchronous Region Flushingmentioning
confidence: 88%
See 1 more Smart Citation
“…A straightforward method to reduce the DRAM footprint is to asynchronously flush cached regions into NVM and reclaim them in advance. However, prior work [20] has reported negative results on this method. Although asynchronous flushing can reduce DRAM consumption, it introduces NVM write operations, which may lead to less NVM bandwidth and worse GC performance.…”
Section: Asynchronous Region Flushingmentioning
confidence: 88%
“…When the number of write operations increases, the overall bandwidth will be strongly affected and decline. This problem is possibly caused by NVM's asymmetric bandwidth: its peak read bandwidth is much larger than the peak write bandwidth [20,24]. Other NVM technologies like phase-change memory (PCM) also have similar problems [34].…”
Section: Detailed Bandwidth Analysismentioning
confidence: 99%
“…The input sizes, ℎ ℎ , ℎ and ℎ ℎ are obtained by offline analysis. For many DNN training workloads, once their hyperparameters (e.g., batch size) are determined, the input sizes for each operation can be known before the training takes place [24,32,33,54].…”
Section: Offline Fpga Kernel Optimizationmentioning
confidence: 99%
“…We target common DNN models whose dataflow graphs do not exhibit data-dependent control, and each training step goes through exactly the same graph, which implies the input data of operations can be known before training. Such DNN models are very common and have been the targets of recent works [24,32,33,54,73].…”
Section: Introductionmentioning
confidence: 99%
“…As a result, PM and DRAM form a heterogeneous memory (HM) system. How to place and migrate data between PM and DRAM to enjoy the speed of DRAM and capacity of PM remains active research [7,11,22,26,39,40].…”
Section: Introductionmentioning
confidence: 99%