One Weight Bitwidth to Rule Them All

Chin, Ting-Wu; Chuang, Pierce; Chandra, Vikas; Marculescu, Diana

doi:10.1007/978-3-030-68238-5_7

Cited by 16 publications

(3 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quantizing networks with depth-wise separable layers (MobileNetV2, EfficientNet lite, DeeplabV3, EfficientDet-D1) is more challenging; a trend we also observed from the PTQ results in section 3.6 and discussed in the literature (Chin et al, 2020;Sheng et al, 2018a). Whereas 8-bit quantization incurs close to no accuracy drop, quantizing weights to 4 bits leads to a larger drop, e.g.…”

Section: Methodssupporting

confidence: 64%

A White Paper on Neural Network Quantization

Nagel¹,

Fournarakis²,

Amjad³

et al. 2021

Preprint

View full text Add to dashboard Cite

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation. In this white paper, we introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance while maintaining low-bit weights and activations. We start with a hardware motivated introduction to quantization and then consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT). PTQ requires no re-training or labelled data and is thus a lightweight push-button approach to quantization. In most cases, PTQ is sufficient for achieving 8-bit quantization with close to floating-point accuracy. QAT requires fine-tuning and access to labeled training data but enables lower bit quantization with competitive results. For both solutions, we provide tested pipelines based on existing literature and extensive experimentation that lead to state-of-the-art performance for common deep learning models and tasks.

show abstract

Section: Methodssupporting

confidence: 64%

A White Paper on Neural Network Quantization

Nagel¹,

Fournarakis²,

Amjad³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Quantization (Low Precision Inference): A common solution is to compress NN models with quantization (Asanovic and Morgan, 1991 ; Hubara et al, 2016 ; Rastegari et al, 2016 ; Zhou et al, 2016 , 2017 ; Cai et al, 2017 , 2020b ; Choi et al, 2018 ; Jacob et al, 2018 ; Zhang et al, 2018a ; Dong et al, 2019 ; Wang et al, 2019c ; Chin et al, 2020 ; Gholami et al, 2021 ), where low bit-precision is used for weights/activations. A notable work here is Deep Compression (Han et al, 2016 ), which used quantization to compress the model footprint of the SqueezeNet model discussed above, bringing its size to 500x smaller than AlexNet.…”

Section: Technology State-of-the-artmentioning

confidence: 99%

Applications and Techniques for Fast Machine Learning in Science

Deiana¹,

Tran²,

Agar³

et al. 2022

Front. Big Data

View full text Add to dashboard Cite

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science—the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.

show abstract

“…To reduce the model size, quantization and pruning are used for model compression [42]. Quantization, which reduces the floating point precision of parameters and gradients, can be rule-based [43] or automated [44], with mixed bitwidths or optimized single bitwidth [45]. On the extreme end, binarized neural networks are quantized to 1, 2 or 3 bits [46] and provide superior efficiency, but at the cost of predictive accuracy.…”

Section: Resource Limited Devicesmentioning

confidence: 99%

Machine Learning Systems in the IoT: Trustworthiness Trade-offs for Edge Intelligence

Toussaint

Ding

2020

2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)

View full text Add to dashboard Cite

Machine learning systems (MLSys) are emerging in the Internet of Things (IoT) to provision edge intelligence, which is paving our way towards the vision of ubiquitous intelligence. However, despite the maturity of machine learning systems and the IoT, we are facing severe challenges when integrating MLSys and IoT in practical context. For instance, many machine learning systems have been developed for large-scale production (e.g., cloud environments), but IoT introduces additional demands due to heterogeneous and resource-constrained devices and decentralized operation environment. To shed light on this convergence of MLSys and IoT, this paper analyzes the tradeoffs by covering the latest developments (up to 2020) on scaling and distributing ML across cloud, edge, and IoT devices. We position machine learning systems as a component of the IoT, and edge intelligence as a socio-technical system. On the challenges of designing trustworthy edge intelligence, we advocate a holistic design approach that takes multi-stakeholder concerns, design requirements and trade-offs into consideration, and highlight the future research opportunities in edge intelligence.

show abstract

One Weight Bitwidth to Rule Them All

Cited by 16 publications

References 33 publications

A White Paper on Neural Network Quantization

A White Paper on Neural Network Quantization

Applications and Techniques for Fast Machine Learning in Science

Machine Learning Systems in the IoT: Trustworthiness Trade-offs for Edge Intelligence

Contact Info

Product

Resources

About