2020
DOI: 10.48550/arxiv.2012.13113
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Abstract: Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 28 publications
0
11
0
Order By: Relevance
“…The significant reduction of DRAM access is the major source of Shift-BNN's high energy efficiency. As various lowerprecision training techniques [4,23,62] Scalability to larger sample size. In some high-risk applications, one may need a more robust BNN model to make decisions, thus requires training BNNs with a larger sample size to strictly approximate the loss function in Eq.1.…”
Section: Evaluation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The significant reduction of DRAM access is the major source of Shift-BNN's high energy efficiency. As various lowerprecision training techniques [4,23,62] Scalability to larger sample size. In some high-risk applications, one may need a more robust BNN model to make decisions, thus requires training BNNs with a larger sample size to strictly approximate the loss function in Eq.1.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…As the unit energy cost (J/bit) of off-chip memory accesses is orders of magnitude higher than that of MACs [11,15,30], data movement usually poses greater challenges for energy-efficient DNN training [56]. Moreover, the ongoing development of lowprecision training techniques [23,27,55] can potentially reduce the unit energy cost of MACs, but this could also result in a proportionally higher impact on the overall training's energy efficiency from the data movement. 1 (a).…”
Section: Challenges Of Bnn Trainingmentioning
confidence: 99%
“…One area uses uniform-precision quantization where the model shares the same precision Choukroun et al, 2019;Gong et al, 2019;Langroudi et al, 2019;Jin et al, 2020a;Bhalgat et al, 2020;Darvish Rouhani et al, 2020;Oh et al, 2021). Another direction studies mixed-precision that determines bit-width for each layer through search algorithms, aiming at better accuracy-efficiency trade-off (Dong et al, 2019;Wang et al, 2019;Habi et al, 2020;Fu et al, 2020;Yang & Jin, 2020;Zhao et al, 2021a;b;Ma et al, 2021b). There is also binarization network, which only applies 1-bit (Rastegari et al, 2016;Hubara et al, 2016;Cai et al, 2017;Bulat et al, 2020;.…”
Section: Related Workmentioning
confidence: 99%
“…While various low precision training techniques have been proposed to boost DNNs' training efficiency [4,30,42,50], most of these techniques adopt a fixed precision allocation strategy throughout the whole training process, leaving a large room for further squeezing out bit-wise savings. Motivated by the recent pioneering works, which advocate that (1) different DNN layers behave differently through the training process [34,45] and (2) different DNN training stages favor different training schemes [27], a few pioneering works [10,11,24] have proposed to adopt dynamic training precision, which varies the precision spatially (e.g., layer-wise precision allocation) and temporally (e.g., different precision in different training epochs) and shows promising training efficiency and optimality over their static counterparts. However, existing dynamic low precision training methods rely on manually designed dynamic precision schedules [10,11,24], thus making it challenging to be directly applied to new models/tasks and limiting their achievable training efficiency.…”
Section: Introductionmentioning
confidence: 99%
“…) dynamic low precision training, (PFQ [11] and (3) CPT [10]). Here, each row shows the precision schedule for the whole model of the baselines, where LDP adopts a learned layer-wise dynamic precision schedule to optimally balance the training efficiency and accuracy trade-off; and (b) LDP's learned spatial precision distribution and temporal precision schedule for ResNet-38@CIFAR-100, where different curves correspond to spatial precision distributions of different residual blocks andthe fractional precision is due to the block-wise average and moving average among iterations for better visualization.…”
mentioning
confidence: 99%