Bit Efficient Quantization for Deep Neural Networks

Nayak, Prateeth; Zhang, David; Chai, Sek

doi:10.48550/arxiv.1910.04877

Cited by 6 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditionally, neural networks relied on 32 bit numerical operations for training and evaluation. It was found however that inference can work fine with 8 bit [18], 3 bit [19], 2 bit [20] or even 1 bit (binary) weights and operations [21]. Training a model is more difficult but models have been successfully trained using 4 bit weights [4].…”

Section: A Managing Model Versionsmentioning

confidence: 99%

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

Leroux¹,

Simoens²,

Lootus³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deploying machine learning applications on edge devices can bring clear benefits such as improved reliability, latency and privacy but it also introduces its own set of challenges. Most works focus on the limited computational resources of edge platforms but this is not the only bottleneck standing in the way of widespread adoption. In this paper we list several other challenges that a TinyML practitioner might need to consider when operationalizing an application on edge devices. We focus on tasks such as monitoring and managing the application, common functionality for a MLOps platform, and show how they are complicated by the distributed nature of edge deployment. We also discuss issues that are unique to edge applications such as protecting a model's intellectual property and verifying its integrity.

show abstract

Section: A Managing Model Versionsmentioning

confidence: 99%

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

Leroux¹,

Simoens²,

Lootus³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Some approaches, e.g. [32,24,6] which only quantize weights to fixed point, are hard to adopted to accelerate the real inference process. Besides, some methods [27,33,5] indeed quantize both weights and activations to fixed point, but they usually need particular hardware or software to facilitate the implementation of quantized inference.…”

Section: Industrial Applicabilitymentioning

confidence: 99%

EasyQuant: Post-training Quantization via Scale Optimization

Wu¹,

Tang²,

Yong-le³

et al. 2020

Preprint

View full text Add to dashboard Cite

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and posttraining quantization. Training-based approach suffers from a cumbersome training process, while post-training quantization may lead to unacceptable accuracy drop. In this paper, we present an efficient and simple post-training method via scale optimization, named EasyQuant (EQ), that could obtain comparable accuracy with the training-based method. Specifically, we first alternately optimize scales of weights and activations for all layers target at convolutional outputs to further obtain the high quantization precision. Then, we lower down bit width to INT7 both for weights and activations, and adopt INT16 intermediate storage and integer Winograd convolution implementation to accelerate inference. Experimental results on various computer vision tasks show that EQ outperforms the TensorRT method and can achieve near INT8 accuracy in 7 bits width post-training.

show abstract

“…However, less essential operations can also be pruned after training 1535,1536 . Another approach is quantization, where ANN bit depths are decreased, often to efficient integer instructions, to increase inference throughput 1537,1538 . Quantization often decreases performance; however, the amount of quantization can be adapted to ANN components to optimize performancethroughput tradeoffs 1539 .…”

Section: Deploymentmentioning

confidence: 99%

Review: Deep Learning in Electron Microscopy

Ede

2020

Preprint

View full text Add to dashboard Cite

Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy.

show abstract

Bit Efficient Quantization for Deep Neural Networks

Cited by 6 publications

References 6 publications

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

TinyMLOps: Operational Challenges for Widespread Edge AI Adoption

EasyQuant: Post-training Quantization via Scale Optimization

Review: Deep Learning in Electron Microscopy

Contact Info

Product

Resources

About