From Research to Production and Back: Ludicrously Fast Neural Machine Translation

Kim, Young Jin; Junczys-Dowmunt, Marcin; Hassan, Hany; Aji, Alham Fikri; Heafield, Kenneth; Grundkiewicz, Roman; Bogoychev, Nikolay

doi:10.18653/v1/d19-5632

Cited by 54 publications

(73 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Team Marian's submission (Kim et al, 2019) was based on their submission to the shared task the previous year, consisting of Transformer models optimized in a number of ways (Junczys-Dowmunt et al, 2018 a number of improvements. Improvements were made to teacher-student training by (1) creating more data for teacher-student training using backward, then forward translation, (2) using multiple teachers to generate better distilled data for training student models.…”

Section: Team Marianmentioning

confidence: 99%

Findings of the Third Workshop on Neural Generation and Translation

Hayashi¹,

Oda²,

Birch³

et al. 2019

Proceedings of the 3rd Workshop on Neural Generation and Translation

View full text Add to dashboard Cite

This document describes the findings of the Third Workshop on Neural Generation and Translation, held in concert with the annual conference of the Empirical Methods in Natural Language Processing (EMNLP 2019). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the two shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document-level generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language.

show abstract

Section: Team Marianmentioning

confidence: 99%

Findings of the Third Workshop on Neural Generation and Translation

Hayashi¹,

Oda²,

Birch³

et al. 2019

Proceedings of the 3rd Workshop on Neural Generation and Translation

View full text Add to dashboard Cite

show abstract

“…This problem setup is closely related to model distillation (Hinton et al, 2014): training a student model to imitate the predictions of a teacher. Distillation has widespread use in MT, including reducing architecture size (Kim and Rush, 2016;Kim et al, 2019), creating multilingual models (Tan et al, 2019), and improving non-autoregressive generation (Ghazvininejad et al, 2019;Stern et al, 2019). Model stealing differs from distillation because the victim's (i.e., teacher's) training data is unknown.…”

Section: Past Work On Distillation and Stealingmentioning

confidence: 99%

Imitation Attacks and Defenses for Black-box Machine Translation Systems

Wallace¹,

Stern²,

Song³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Adversaries may look to steal or attack blackbox NLP systems, either for financial gain or to exploit model errors. One setting of particular interest is machine translation (MT), where models have high commercial value and errors can be costly. We investigate possible exploitations of black-box MT systems and explore a preliminary defense against such threats. We first show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs. Using simulated experiments, we demonstrate that MT model stealing is possible even when imitation models have different input data or architectures than their target models. Applying these ideas, we train imitation models that reach within 0.6 BLEU of three production MT systems on both high-resource and low-resource language pairs. We then leverage the similarity of our imitation models to transfer adversarial examples to the production systems. We use gradient-based attacks that expose inputs which lead to semanticallyincorrect translations, dropped content, and vulgar model outputs. To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models. This defense degrades the adversary's BLEU score and attack success rate at some cost in the defender's BLEU and inference speed. Transfer Solve Eq. (2)Save me it's over 100°F

show abstract

“…The goal of the distillation step is to reduce the complexity of the original training data D tr . Instead of achieving this with a single deep teacher as in Kim and Rush (2016) and Kim et al (2019), we use the multiple domainspecific teachers trained in step 1. Each training set D tr i is translated with its corresponding deep teacher, resulting in a distilled version D dist(tr) i of that set.…”

Section: In-domain Distillationmentioning

confidence: 99%

Distilling Multiple Domains for Neural Machine Translation

Currey

Mathur

Dinu

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Neural machine translation achieves impressive results in high-resource conditions, but performance often suffers when the input domain is low-resource. The standard practice of adapting a separate model for each domain of interest does not scale well in practice from both a quality perspective (brittleness under domain shift) as well as a cost perspective (added maintenance and inference complexity). In this paper, we propose a framework for training a single multi-domain neural machine translation model that is able to translate several domains without increasing inference time or memory usage. We show that this model can improve translation on both highand low-resource domains over strong multidomain baselines. In addition, our proposed model is effective when domain labels are unknown during training, as well as robust under noisy data conditions.

show abstract

From Research to Production and Back: Ludicrously Fast Neural Machine Translation

Cited by 54 publications

References 11 publications

Findings of the Third Workshop on Neural Generation and Translation

Findings of the Third Workshop on Neural Generation and Translation

Imitation Attacks and Defenses for Black-box Machine Translation Systems

Distilling Multiple Domains for Neural Machine Translation

Contact Info

Product

Resources

About