2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461718
|View full text |Cite
|
Sign up to set email alerts
|

Spoken Language Understanding without Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
73
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 85 publications
(73 citation statements)
references
References 15 publications
0
73
0
Order By: Relevance
“…Work in progress. end-to-end architectures capable of learning how to map sequences of acoustic features directly to SLU recognition units [5,6,7,8]. SLU units that are typically used are combinations of ASR-level units (e.g.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Work in progress. end-to-end architectures capable of learning how to map sequences of acoustic features directly to SLU recognition units [5,6,7,8]. SLU units that are typically used are combinations of ASR-level units (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…intents, slots) [9,10]. Two-step training approaches have also been proposed, where the network is pretrained on large datasets using ASRlevel recognition units, and it is subsequently finetuned on the target dataset using NLU-level recognition units [7,11].…”
Section: Introductionmentioning
confidence: 99%
“…Nowadays there is a growing research interest in end-to-end systems for various SLU tasks [23][24][25][26][27][28][29][30][31]. In this work, similarly to [26,29], end-to-end training of signal-to-concept models is performed through the recurrent neural network (RNN) architecture and the connectionist temporal classification (CTC) loss function [32] as shown in Figure 1.…”
Section: End-to-end Signal-to-concept Neural Architecturementioning
confidence: 99%
“…The use of end-to-end models for spoken language understanding (SLU) is beginning to be given more serious consideration [1][2][3][4]. Whereas conventional SLU uses an automatic speech recognition (ASR) component to transcribe the audio into text and a natural language understanding (NLU) component to map the text to semantics, an end-to-end model maps the audio directly to the semantics [5][6][7]. End-to-end models have several advantages over the conventional SLU setup: they have reduced computational requirements and software implementation complexity, avoid downstream errors due to incorrect transcripts, can have the entire set of model parameters optimized for the ultimate performance criterion (semantic accuracy) as opposed to a surrogate criterion (word error rate), and can take advantage of information present in the speech signal but not in the transcript, such as prosody.…”
Section: Introductionmentioning
confidence: 99%