Previous translation models like statistical machine translation (SMT), rule-based machine translation (RBMT), hybrid machine translation (HMT), and neural machine translation (NMT) have reached their performance bottleneck. The new Transformer-based machine translation model has become the favorite choice for English language translation. For instance, Google’s BERT translation model organizes the Transformer module into bidirectional encoder representations. It is aware of the users’ search intentions as well as the material that the search engine has indexed. It does not need to evaluate previous searches to comprehend what people mean, unlike RankBrain. BERT comprehends words, sentences, and complete information in the same way that we do. It achieves remarkable translation quality improvement over the other state-of-the-art benchmarks. It demonstrates the great potential of the Transformer model. The Transformer-based translation model mainly improves the performance at the cost of growing model sizes and complexity, usually requiring million-scale parameters. It is hard for the traditional computing systems to cope with the growing memory and computation requirements. However, the latest computers can easily run this model without any lag. The biggest challenge of applying the Transformer model is to deploy these models efficiently onto real-time or embedded devices. In this work, we propose a quantization scheme to reduce the parameter and computation complexity. It is of great importance to promote the usage of the Transformer model. Our experiment results show that the original Transformer model in 32 bit floating-point can be quantized to only 8 bits to 12 bits with only negligible translation quality loss. However, due to the perfect transformation of the block part, this quality loss part can easily be managed by the users. Meanwhile, our algorithm achieves 2.6 × to 4.0 × compression ratio, which is helpful to save the required complexity and energy during the inference phase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.