2019
DOI: 10.48550/arxiv.1910.00486
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dialogue Transformers

Abstract: We introduce a dialogue policy based on a transformer architecture [1], where the self-attention mechanism operates over the sequence of dialogue turns. Recent work has used hierarchical recurrent neural networks to encode multiple utterances in a dialogue context, but we argue that a pure selfattention mechanism is more suitable. By default, an RNN assumes that every item in a sequence is relevant for producing an encoding of the full sequence, but a single conversation can consist of multiple overlapping dis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 20 publications
0
9
0
Order By: Relevance
“…Apart from NLU tasks, the key dialogue management task is to select the most appropriate system response depending on the context. Rasa provides a Transformer Embedding Dialogue Policy (TED) component [12] to handle this task.…”
Section: A Transformer-based Dialogue Processing In Rasamentioning
confidence: 99%
“…Apart from NLU tasks, the key dialogue management task is to select the most appropriate system response depending on the context. Rasa provides a Transformer Embedding Dialogue Policy (TED) component [12] to handle this task.…”
Section: A Transformer-based Dialogue Processing In Rasamentioning
confidence: 99%
“…One of the major advantages of transformers is that they achieve independency while making predictions on different dialogue stages based on the selfattention mechanism. This mechanism is used to learn sentence representations by comparing different positions of that sentence and allows preselecting tokens that affect the current state of the encoder [48,49]. The authors of [50] developed the Recurrent Embedding Dialogue Policy (REDP) architecture that utilizes the attention mechanism to achieve a better performance while recovering from unexpected dialogue inputs.…”
Section: Dialogue Management Modulementioning
confidence: 99%
“…The authors of [50] developed the Recurrent Embedding Dialogue Policy (REDP) architecture that utilizes the attention mechanism to achieve a better performance while recovering from unexpected dialogue inputs. In their work [48], Vlasov et al simplified the architecture of REDP and introduced TED policy. TED policy tries to maximize a similarity function while jointly training embeddings for a dialogue state and system actions.…”
Section: Dialogue Management Modulementioning
confidence: 99%
See 1 more Smart Citation
“…Song et al employs adversarial training to improve both the quality and diversity of generated texts [39]. Recently, the Transformer encoder-decoder framework [42] is also employed in text generation models [43] to boost coherence.…”
Section: Neural Text Generationmentioning
confidence: 99%