ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683274
|View full text |Cite
|
Sign up to set email alerts
|

Joint On-line Learning of a Zero-shot Spoken Semantic Parser and a Reinforcement Learning Dialogue Manager

Abstract: Despite many recent advances for the design of dialogue systems, a true bottleneck remains the acquisition of data required to train its components. Unlike many other language processing applications, dialogue systems require interactions with users, therefore it is complex to develop them with pre-recorded data. Building on previous works, on-line learning is pursued here as a most convenient way to address the issue. Data collection, annotation and use in learning algorithms are performed in a single process… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…It is worth mentioning that in complementary experiments from our prior work [16] the results obtained after on-line training seem to suffer of great variability, depending on the choices made by the expert training the system. The experts have a large margin of action in how they train their system: for instance, they can decide to locally reward only the correct actions (positively), or reversely, only the bad ones (negatively) or ideally, but more costly, both.…”
Section: Discussionmentioning
confidence: 97%
See 2 more Smart Citations
“…It is worth mentioning that in complementary experiments from our prior work [16] the results obtained after on-line training seem to suffer of great variability, depending on the choices made by the expert training the system. The experts have a large margin of action in how they train their system: for instance, they can decide to locally reward only the correct actions (positively), or reversely, only the bad ones (negatively) or ideally, but more costly, both.…”
Section: Discussionmentioning
confidence: 97%
“…Therefore, an enhanced version of the system, referred to as trained hereafter, is obtained by replacing the initial SP module and the handcrafted dialogue manager policy by on-line learnt ones. The learning protocol proposed to achieve it, referred to as on-line training below, directly juxtaposes an adversarial bandit to learn the SP module and a Q-learner reinforcement learning approach to learn the dialogue manager policy following our prior work [16]. The knowledge base of the SP module as well as the DM policy are adapted after each dialogue turn.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation