Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1963
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Neural Transformer Based Spoken Language Understanding

Abstract: Spoken language understanding (SLU) refers to the process of inferring the semantic information from audio signals.While the neural transformers consistently deliver the best performance among the state-of-the-art neural architectures in field of natural language processing (NLP), their merits in a closely related field, i.e., spoken language understanding (SLU) have not beed investigated. In this paper, we introduce an end-toend neural transformer-based SLU model that can predict the variable-length domain, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 46 publications
(31 citation statements)
references
References 28 publications
0
30
0
1
Order By: Relevance
“…Set 2 [Light pretraining] -Experiments (5)(6)(7)(8). In this category encoder and decoder have their components initialized with models trained on the ATIS dataset.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Set 2 [Light pretraining] -Experiments (5)(6)(7)(8). In this category encoder and decoder have their components initialized with models trained on the ATIS dataset.…”
Section: Methodsmentioning
confidence: 99%
“…More recently, other RNN based seq2seq models have been proposed by [12], highlighting the importance of model pre-training. The first Transformer based seq2seq model for E2E SLU was introduced in [6]; however, the authors used an architecture which supports neither multi-task learning nor model pre-training.…”
Section: Introductionmentioning
confidence: 99%
“…SOTA transformer: A state-of-art end-to-end SLU model fully based on transformers [20] which was evaluated on the FluentSpeech Commands dataset. We compare to the best results in [20] from its classification-based model. SOTA RNN: A state-of-art bidirectional RNN encoder based end-to-end model presented in [18] designed for the FluentSpeech Commands dataset.…”
Section: Sincnet/dfsmn-transformermentioning
confidence: 99%
“…Transformers [21] are powerful neural architectures that lately have been used in ASR [22][23][24], SLU [25], and other audio-visual applications [26] with great success, mainly due to their attention mechanism. Only until recently, the attention concept has also been applied to beamforming, specifically for speech and noise mask estimations [9,27].…”
Section: Introductionmentioning
confidence: 99%