Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1159
|View full text |Cite
|
Sign up to set email alerts
|

Minimum Risk Training for Neural Machine Translation

Abstract: We propose minimum risk training for end-to-end neural machine translation. Unlike conventional maximum likelihood estimation, minimum risk training is capable of optimizing model parameters directly with respect to arbitrary evaluation metrics, which are not necessarily differentiable. Experiments show that our approach achieves significant improvements over maximum likelihood estimation on a state-of-the-art neural machine translation system across various languages pairs. Transparent to architectures, our a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
356
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 345 publications
(358 citation statements)
references
References 19 publications
1
356
1
Order By: Relevance
“…For online translation services, decoding speed is a crucial factor to achieve a better user experience. Several recently proposed training methods (Shen et al, 2015;Wiseman and Rush, 2016) aim to solve the exposure bias problem, but require decoding the whole training set multiple times, which is extremely time-consuming for millions of sentences.…”
Section: Introductionmentioning
confidence: 99%
“…For online translation services, decoding speed is a crucial factor to achieve a better user experience. Several recently proposed training methods (Shen et al, 2015;Wiseman and Rush, 2016) aim to solve the exposure bias problem, but require decoding the whole training set multiple times, which is extremely time-consuming for millions of sentences.…”
Section: Introductionmentioning
confidence: 99%
“…A terminal reward is received when the policy finishes generating sequence. [Shen et al, 2016] propose the minimum risk training to minimize the expected loss on the training data, which is in the same spirit as RL formulation to directly optimize the evaluation metric. argues that immediate reward is good for faster convergence, and makes reward signal less sparse.…”
Section: Related Workmentioning
confidence: 99%
“…Most of existing work in neural machine translation focus on handling rare words [12,20,15], integrating SMT strategies [6,28,24,21], designing the better framework [23,14,16] and addressing the low resource scenario [2,27,19].…”
Section: Related Workmentioning
confidence: 99%