2022
DOI: 10.3390/info13020088
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Abstract: Neural machine translation (NMT) systems have greatly improved the quality available from machine translation (MT) compared to statistical machine translation (SMT) systems. However, these state-of-the-art NMT models need much more computing power and data than SMT models, a requirement that is unsustainable in the long run and of very limited benefit in low-resource scenarios. To some extent, model compression—more specifically state-of-the-art knowledge distillation techniques—can remedy this. In this work, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 17 publications
1
5
0
Order By: Relevance
“…13 In light of such developments, a 'green report' was incorporated into adaptNMT whereby the kgCO 2 generated during model development is logged. This is very much in line with the industry trend of quantifying the impact of NLP on the environment; indeed, Jooste et al (2022a) have demonstrated that highperforming MT systems can be built with much lower footprints, which not only reduce emissions, but also in the post-deployment phase deliver savings of almost 50% in energy costs for a real translation company.…”
Section: Stochastic Nuancessupporting
confidence: 70%
“…13 In light of such developments, a 'green report' was incorporated into adaptNMT whereby the kgCO 2 generated during model development is logged. This is very much in line with the industry trend of quantifying the impact of NLP on the environment; indeed, Jooste et al (2022a) have demonstrated that highperforming MT systems can be built with much lower footprints, which not only reduce emissions, but also in the post-deployment phase deliver savings of almost 50% in energy costs for a real translation company.…”
Section: Stochastic Nuancessupporting
confidence: 70%
“…Liang et al [118] tried to improve on the student model for the translation task by attempting to train a teacher using multiple sub-networks to produce various output variants. To improve upon NMT, Jooste et al [119] employed sequencelevel knowledge distillation on small-sized student models to distill knowledge from large teacher models. Quantization This is the method of compressing neural networks which are described as alleviating the precision of weights.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…Due to the introduction of AI to NMT systems, the quality of MT outputs has significantly increased [10]. This has implications not only for professional translation but also in the classroom, as students start to shift their attention to MT results rather than to other translation strategies.…”
Section: Neural Machine Translationmentioning
confidence: 99%