Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017) 2017
DOI: 10.18653/v1/s17-1008
|View full text |Cite
|
Sign up to set email alerts
|

Deep Active Learning for Dialogue Generation

Abstract: We propose an online, end-to-end, neural generative conversational model for opendomain dialogue. It is trained using a unique combination of offline two-phase supervised learning and online human-inthe-loop active learning. While most existing research proposes offline supervision or hand-crafted reward functions for online reinforcement, we devise a novel interactive learning mechanism based on hamming-diverse beam search for response generation and one-character userfeedback at each step. Experiments show t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
29
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 40 publications
(31 citation statements)
references
References 23 publications
2
29
0
Order By: Relevance
“…, 2014). We experimented with various beam sizes (Graves, 2012), but greedy decoding performed better according to all metrics, also observed previously (Asghar et al, 2017;Shao et al, 2017;Tandon et al, 2017).…”
Section: Datasetsupporting
confidence: 59%
“…, 2014). We experimented with various beam sizes (Graves, 2012), but greedy decoding performed better according to all metrics, also observed previously (Asghar et al, 2017;Shao et al, 2017;Tandon et al, 2017).…”
Section: Datasetsupporting
confidence: 59%
“…As opposed to offline learning, [87] proposed an interactive based reinforcement learning mechanism whereby for each answer generated by the model, user feedback is obtained and feedback to the model to update model param based on single question-answer pair. This method of updating model parameters based on a single example is called one-shot learning [88].…”
Section: ) Deep Reinforcement Learning (Drl)mentioning
confidence: 99%
“…Several heuristic criteria are proposed in (Li et al, 2016a,b) as objectives to optimize. Asghar et al (2017) proposes humans-in-the-loop to select the best response out of a few generated candidates. Cheng et al (2018) uses an additional input signal -the specificity level of a response, which is estimated by certain heuristics at training time and can be varied during evaluation.…”
Section: Related Workmentioning
confidence: 99%