C. Luschi scite author profile

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide improved generalization performance and allows a significantly smaller memory footprint, which might also be exploited to improve machine throughput. In this paper, we review common assumptions on learning rate scaling and training duration, as a basis for an experimental comparison of test performance for different mini-batch sizes. We adopt a learning rate that corresponds to a constant average weight update per gradient calculation (i.e., per unit cost of computation), and point out that this results in a variance of the weight updates that increases linearly with the mini-batch size m. The collected experimental results for the CIFAR-10, CIFAR-100 and ImageNet datasets show that increasing the mini-batch size progressively reduces the range of learning rates that provide stable convergence and acceptable test performance. On the other hand, small mini-batch sizes provide more up-to-date gradient calculations, which yields more stable and reliable training. The best performance has been consistently obtained for mini-batch sizes between m = 2 and m = 32, which contrasts with recent work advocating the use of mini-batch sizes in the thousands.

show abstract

Iterative channel estimation using soft decision feedback

Sandell¹,

Luschi²,

Strauch³

et al.

View full text Add to dashboard Cite

Exact and approximated expressions of the log-likelihood ratio for 16-QAM signals

Allpress¹,

Luschi²,

Felix³

View full text Add to dashboard Cite

Nonparametric trellis equalization in the presence of non-gaussian interference

Luschi¹,

Mulgrew

2003

IEEE Trans. Commun.

View full text Add to dashboard Cite

8-bit Numerical Formats for Deep Neural Networks

Noune¹,

Jones²,

Justus³

et al. 2022

Preprint

View full text Add to dashboard Cite

Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the advantages of floating-point over fixed-point representation, and present an in-depth study on the use of 8-bit floating-point number formats for activations, weights, and gradients for both training and inference. We explore the effect of different bit-widths for exponents and significands and different exponent biases. The experimental results demonstrate that a suitable choice of these lowprecision formats enables faster training and reduced power consumption without any degradation in accuracy for a range of deep learning models for image classification and language processing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

C. Luschi

Revisiting Small Batch Training for Deep Neural Networks

Iterative channel estimation using soft decision feedback

Exact and approximated expressions of the log-likelihood ratio for 16-QAM signals

Nonparametric trellis equalization in the presence of non-gaussian interference

8-bit Numerical Formats for Deep Neural Networks

Contact Info

Product

Resources

About