Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence 2017
DOI: 10.24963/ijcai.2017/432
|View full text |Cite
|
Sign up to set email alerts
|

Sequence Prediction with Unlabeled Data by Reward Function Learning

Abstract: Reinforcement learning (RL), which has been successfully applied to sequence prediction, introduces reward as sequence-level supervision signal to evaluate the quality of a generated sequence. Existing RL approaches use the ground-truth sequence to define reward, which limits the application of RL techniques to labeled data. Since labeled data is usually scarce and/or costly to collect, it is desirable to leverage large-scale unlabeled data. In this paper, we extend existing RL methods for sequence prediction … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
3

Relationship

4
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 9 publications
0
6
0
Order By: Relevance
“…Zhang and Zong (2016) propose self learning approach to generate the synthetic data for the source-side monolingual data, which is a semi-supervised method. Wu et al (2017) leverage the source-side monolingual data to train the NMT system by learning reward func-4200 tion in a reinforcement learning framework.…”
Section: Improving Nmt By Monolingual Datamentioning
confidence: 99%
“…Zhang and Zong (2016) propose self learning approach to generate the synthetic data for the source-side monolingual data, which is a semi-supervised method. Wu et al (2017) leverage the source-side monolingual data to train the NMT system by learning reward func-4200 tion in a reinforcement learning framework.…”
Section: Improving Nmt By Monolingual Datamentioning
confidence: 99%
“…To address the inconsistency issue, reinforcement learning (RL) methods have been adopted to optimize sequence-level objectives. For example, policy optimization methods such as REINFORCE (Ranzato et al, 2016;Wu et al, 2017b) and actorcritic (Bahdanau et al, 2017) are leveraged for sequence generation tasks including NMT. In machine translation community, a similar method is proposed with the name 'minimum risk training' (Shen et al, 2016).…”
Section: Introductionmentioning
confidence: 99%
“…Even though RL based models are difficult to train, in recent years, multiple works (Mnih et al, 2014;Choi et al, 2017;Yu et al, 2017;Narayan et al, 2018;Sathish et al, 2018;Shen et al, 2018) have shown to improve the performance of several natural language processing tasks. Also, it has been used in NMT Wu et al, 2017;Bahdanau et al, 2017) to overcome the inconsistency between the token level objective function and sequence-level evaluation metrics such as BLEU. Our approach is also related to the method proposed by Lei et al (2016) to explain the decision of text classifier.…”
Section: Related Workmentioning
confidence: 99%