Natural Interaction With Robots, Knowbots and Smartphones 2013
DOI: 10.1007/978-1-4614-8280-2_31
|View full text |Cite
|
Sign up to set email alerts
|

Co-adaptation in Spoken Dialogue Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…To address the time-consuming development for simulator policy, recent studies [10,26,27] proposed a one-to-one dialogue model where a dialogue manager and a user simulator were optimized concurrently. Different from the above studies, our proposed MADM applies the reward shaping technique [11] based on the adjacency pairs in conversational analysis [12], which can help the cooperative policies learn from scratch quickly.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To address the time-consuming development for simulator policy, recent studies [10,26,27] proposed a one-to-one dialogue model where a dialogue manager and a user simulator were optimized concurrently. Different from the above studies, our proposed MADM applies the reward shaping technique [11] based on the adjacency pairs in conversational analysis [12], which can help the cooperative policies learn from scratch quickly.…”
Section: Related Workmentioning
confidence: 99%
“…Different from the above studies, our proposed MADM applies the reward shaping technique [11] based on the adjacency pairs in conversational analysis [12], which can help the cooperative policies learn from scratch quickly. By the method of reward shaping, our proposed MADM avoids running a learning algorithm multiple times in a study [26] and collects the corpora in studies [10,27].…”
Section: Related Workmentioning
confidence: 99%
“…Another bottleneck in domain adaptation that has not yet been resolved is the need to predefine all actions and slots prior to learning. It is desirable for a policy to be able to dynamically adapt to new concepts discovered throughout an interaction, similar to how humans continuously evolve and learn through communication (Chandramohan et al 2014), moving further beyond multi‐domain dialogue systems to open‐domain ones.…”
Section: Adaptationmentioning
confidence: 99%
“…We argue that if the human operator may not be optimally acting to maximize the users’ satisfaction, the users are unconsciously trying to optimize their satisfaction when interacting with a machine. IRL could, therefore, be used to learn the internal (non-observable) reward function that users naturally try to maximize (Chandramohan et al ., 2011).…”
Section: Future Directionsmentioning
confidence: 99%