Binary neural networks for speech recognition

Ye, Qian; Xiang, Xu

doi:10.1631/fitee.1800469

Cited by 23 publications

(13 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dai et al [20] proposed a hidden-layer LSTM and grow-and-prune training method to address the problems of model redundancy and runtime delay. Qian et al [21] introduced binary neural network for acoustic modeling in speech recognition. Mori et al [22] performed Tensor-Train decomposition on the weight matrix of the recurrent network to reduce the number of ASR parameters.…”

Section: Related Workmentioning

confidence: 99%

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Ming

Lei

et al. 2021

Neurocomputing

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Ming

Lei

et al. 2021

Neurocomputing

View full text Add to dashboard Cite

“…An ultimate goal for many data and resource intensive deep learning based AI applications, including ASR systems, is to derive "lossless" model compression approaches that allow high performance and low-footprint speech recognition systems to be constructed while incurring minimum performance degradation. To this end, one efficient solution is to use low-bit deep neural network (DNN) quantization techniques [17,18,19,20], which has drawn increasing interest in the machine learning and speech technology community in recent years. By replacing floating point weights with low precision values, the resulting quantization methods can significantly reduce the model size and inference time without modifying the model architectures.…”

Section: Introductionmentioning

confidence: 99%

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus

Xu¹,

Hu²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

State-of-the-art time automatic speech recognition (ASR) systems are becoming increasingly complex and expensive for practical applications. This paper presents the development of a high performance and low-footprint 4-bit quantized LF-MMI trained factored time delay neural networks (TDNNs) based ASR system on the 300-hr Switchboard corpus. A key feature of the overall system design is to account for the fine-grained, varying performance sensitivity at different model components to quantization errors. To this end, a set of neural architectural compression and mixed precision quantization approaches were used to facilitate hidden layer level auto-configuration of optimal factored TDNN weight matrix subspace dimensionality and quantization bit-widths. The proposed techniques were also used to produce 2-bit mixed precision quantized Transformer language models. Experiments conducted on the Switchboard data suggest that the proposed neural architectural compression and mixed precision quantization techniques consistently outperform the uniform precision quantised baseline systems of comparable bit-widths in terms of word error rate (WER). An overall "lossless" compression ratio of 13.6 was obtained over the baseline full precision system including both the TDNN and Transformer components while incurring no statistically significant WER increase.

show abstract

“…Another powerful family of techniques recently drawing increasing interest across the machine learning, computer vision and speech technology communities to solve this problem is to use low-bit DNN quantization techniques [31]- [37], [52], [57], [58], [62], [74], [75]. By replacing floating point based DNN parameters with low precision values, for example, binary numbers, model sizes can be dramatically reduced without changing the DNN architecture [32], [57], [73].…”

Section: Introductionmentioning

confidence: 99%

“…1) To the best of our knowledge, this paper presents the first work in the speech technology community to apply mixed precision DNN quantization techniques to both LSTM-RNN and Transformer based NNLMs. In contrast, prior researches within the speech community in this direction largely focused on uniform precision based quantization of convolutional neural networks (CNNs) acoustic models [62] and LSTM-RNN language models [57], [58], [75].…”

Section: Introductionmentioning

confidence: 99%

“…2) To the best of our knowledge, this paper is the first work to introduce ADMM based neural network quantization techniques for speech recognition tasks. In contrast, prior researches with the speech technology community on lowbit quantization of CNNs [62] and LSTM-RNN LMs [57], [58], [75] used the modified BP algorithm [31], [32] while the inconsistency between discrete, quantized parameters and gradient based SGD update remains unaddressed.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

Xu,

Yu,

et al. 2021

Preprint

View full text Add to dashboard Cite

State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications. Low-bit neural network quantization provides a powerful solution to dramatically reduce their model size. Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors. To this end, novel mixed precision neural network LM quantization methods are proposed in this paper. The optimal local precision choices for LSTM-RNN and Transformer based neural LMs are automatically learned using three techniques. The first two approaches are based on quantization sensitivity metrics in the form of either the KL-divergence measured between full precision and quantized LMs, or Hessian trace weighted quantization perturbation that can be approximated efficiently using matrix free techniques. The third approach is based on mixed precision neural architecture search. In order to overcome the difficulty in using gradient descent methods to directly estimate discrete quantized weights, alternating direction methods of multipliers (ADMM) are used to efficiently train quantized LMs. Experiments were conducted on state-of-theart LF-MMI CNN-TDNN systems featuring speed perturbation, i-Vector and learning hidden unit contribution (LHUC) based speaker adaptation on two tasks: Switchboard telephone speech and AMI meeting transcription. The proposed mixed precision quantization techniques achieved "lossless" quantization on both tasks, by producing model size compression ratios of up to approximately 16 times over the full precision LSTM and Transformer baseline LMs, while incurring no statistically significant word error rate increase.

show abstract

Binary neural networks for speech recognition

Cited by 23 publications

References 30 publications

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

Contact Info

Product

Resources

About