FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Fu, Yonggan; You, Haoran; Zhao, Yang; Wang, Yue; Li, Chaojian; Gopalakrishnan, Kailash; Wang, Zhangyang; Lin, Yingyan

doi:10.48550/arxiv.2012.13113

Cited by 10 publications

(11 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The significant reduction of DRAM access is the major source of Shift-BNN's high energy efficiency. As various lowerprecision training techniques [4,23,62] Scalability to larger sample size. In some high-risk applications, one may need a more robust BNN model to make decisions, thus requires training BNNs with a larger sample size to strictly approximate the loss function in Eq.1.…”

Section: Evaluation Resultsmentioning

confidence: 99%

“…As the unit energy cost (J/bit) of off-chip memory accesses is orders of magnitude higher than that of MACs [11,15,30], data movement usually poses greater challenges for energy-efficient DNN training [56]. Moreover, the ongoing development of lowprecision training techniques [23,27,55] can potentially reduce the unit energy cost of MACs, but this could also result in a proportionally higher impact on the overall training's energy efficiency from the data movement. 1 (a).…”

Section: Challenges Of Bnn Trainingmentioning

confidence: 99%

See 1 more Smart Citation

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Wan

Xia

Zhang

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

Bayesian Neural Networks (BNNs) that possess a property of uncertainty estimation have been increasingly adopted in a wide range of safety-critical AI applications which demand reliable and robust decision making, e.g., self-driving, rescue robots, medical image diagnosis. The training procedure of a probabilistic BNN model involves training an ensemble of sampled DNN models, which induces orders of magnitude larger volume of data movement than training a single DNN model. In this paper, we reveal that the root cause for BNN training inefficiency originates from the massive off-chip data transfer by Gaussian Random Variables (GRVs). To tackle this challenge, we propose a novel design that eliminates all the off-chip data transfer by GRVs through the reversed shifting of Linear Feedback Shift Registers (LFSRs) without incurring any training accuracy loss. To efficiently support our LFSR reversion strategy at the hardware level, we explore the design space of the current DNN accelerators and identify the optimal computation mapping scheme to best accommodate our strategy. By leveraging this finding, we design and prototype the first highly efficient BNN training accelerator, named Shift-BNN, that is low-cost and scalable. Extensive evaluation on five representative BNN models demonstrates that Shift-BNN achieves an average of 4.9× (up to 10.8×) boost in energy efficiency and 1.6× (up to 2.8×) speedup over the baseline DNN training accelerator. CCS CONCEPTS• Computer systems organization → Neural networks; • Hardware → Hardware accelerators.

show abstract

Section: Evaluation Resultsmentioning

confidence: 99%

Section: Challenges Of Bnn Trainingmentioning

confidence: 99%

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Wan

Xia

Zhang

et al. 2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

show abstract

“…One area uses uniform-precision quantization where the model shares the same precision Choukroun et al, 2019;Gong et al, 2019;Langroudi et al, 2019;Jin et al, 2020a;Bhalgat et al, 2020;Darvish Rouhani et al, 2020;Oh et al, 2021). Another direction studies mixed-precision that determines bit-width for each layer through search algorithms, aiming at better accuracy-efficiency trade-off (Dong et al, 2019;Wang et al, 2019;Habi et al, 2020;Fu et al, 2020;Yang & Jin, 2020;Zhao et al, 2021a;b;Ma et al, 2021b). There is also binarization network, which only applies 1-bit (Rastegari et al, 2016;Hubara et al, 2016;Cai et al, 2017;Bulat et al, 2020;.…”

Section: Related Workmentioning

confidence: 99%

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Jin¹,

Ren²,

Zhuang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and fullprecision models. To reduce it, existing quantization approaches require highprecision INT32 or full-precision multiplication during inference for scaling or dequantization. This introduces a noticeable cost in terms of memory, speed, and required energy. To tackle these issues, we present F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication. To derive our method, we first discuss the advantages of fixed-point multiplication with different formats of fixed-point numbers and study the statistical behavior of the associated fixedpoint numbers. Second, based on the statistical and algorithmic analysis, we apply different fixed-point formats for weights and activations of different layers. We introduce a novel algorithm to automatically determine the right format for each layer during training. Third, we analyze a previous quantization algorithmparameterized clipping activation (PACT)-and reformulate it using fixed-point arithmetic. Finally, we unify the recently proposed method for quantization finetuning and our fixed-point approach to show the potential of our method. We verify F8Net on ImageNet for MobileNet V1/V2 and ResNet18/50. Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.

show abstract

“…While various low precision training techniques have been proposed to boost DNNs' training efficiency [4,30,42,50], most of these techniques adopt a fixed precision allocation strategy throughout the whole training process, leaving a large room for further squeezing out bit-wise savings. Motivated by the recent pioneering works, which advocate that (1) different DNN layers behave differently through the training process [34,45] and (2) different DNN training stages favor different training schemes [27], a few pioneering works [10,11,24] have proposed to adopt dynamic training precision, which varies the precision spatially (e.g., layer-wise precision allocation) and temporally (e.g., different precision in different training epochs) and shows promising training efficiency and optimality over their static counterparts. However, existing dynamic low precision training methods rely on manually designed dynamic precision schedules [10,11,24], thus making it challenging to be directly applied to new models/tasks and limiting their achievable training efficiency.…”

Section: Introductionmentioning

confidence: 99%

“…) dynamic low precision training, (PFQ [11] and (3) CPT [10]). Here, each row shows the precision schedule for the whole model of the baselines, where LDP adopts a learned layer-wise dynamic precision schedule to optimally balance the training efficiency and accuracy trade-off; and (b) LDP's learned spatial precision distribution and temporal precision schedule for ResNet-38@CIFAR-100, where different curves correspond to spatial precision distributions of different residual blocks andthe fractional precision is due to the block-wise average and moving average among iterations for better visualization.…”

mentioning

confidence: 99%

LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

Yu¹,

Fu²,

Wang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Low precision deep neural network (DNN) training is one of the most effective techniques for boosting DNNs' training efficiency, as it trims down the training cost from the finest bit level. While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help DNNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts. However, existing dynamic low precision training methods rely on manually designed precision schedules to achieve advantageous efficiency and accuracy trade-offs, limiting their more comprehensive practical applications and achievable performance. To this end, we propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs. It is worth noting that LDP-trained DNNs are by nature efficient during inference. Furthermore, we visualize the resulting temporal and spatial precision schedule and distribution of LDP trained DNNs on different tasks to better understand the corresponding DNNs' characteristics at different training stages and DNN layers both during and after training, drawing insights for promoting further innovations. Extensive experiments and ablation studies (seven networks, five datasets, and three tasks) show that the proposed LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs. For example, in addition to having the advantage of being automated, our LDP achieves a 0.31% higher accuracy with a 39.1% lower computational cost when training ResNet-20 on CIFAR-10 as compared with the best SOTA method.

show abstract

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Cited by 10 publications

References 28 publications

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

Contact Info

Product

Resources

About