2021
DOI: 10.48550/arxiv.2103.06370
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Causal-aware Safe Policy Improvement for Task-oriented dialogue

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…Sequicity [22] is an example of Seq2Seq-based models applied to dialogue systems; DAMD [23] extended a single-domain dialogue system to multiple domains; LABES-S2S [24] attempted semi-supervised learning. Third, several studies have explored the application of reinforcement learning in dialogue systems, including models such as JOUST [25], LAVA [26], DORA [27], SUMBT+LaRL [28], and CASPI [29]. With the advent of pre-trained language models, models such as DoTS [30] used Bidirectional Encoder Representations from Transformers (BERT) and Gated Recurrent Unit (GRU) for dialogue state tracking.…”
Section: Task-oriented Dialogue Systemmentioning
confidence: 99%
“…Sequicity [22] is an example of Seq2Seq-based models applied to dialogue systems; DAMD [23] extended a single-domain dialogue system to multiple domains; LABES-S2S [24] attempted semi-supervised learning. Third, several studies have explored the application of reinforcement learning in dialogue systems, including models such as JOUST [25], LAVA [26], DORA [27], SUMBT+LaRL [28], and CASPI [29]. With the advent of pre-trained language models, models such as DoTS [30] used Bidirectional Encoder Representations from Transformers (BERT) and Gated Recurrent Unit (GRU) for dialogue state tracking.…”
Section: Task-oriented Dialogue Systemmentioning
confidence: 99%