2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) 2019
DOI: 10.1109/sped.2019.8906584
|View full text |Cite
|
Sign up to set email alerts
|

Towards End-to-End spoken intent recognition in smart home

Abstract: on clean transcriptions whereas ASR transcriptions contain errors reducing the overall performance. Although the pipeline approach is widely adopted, there is a rising interest for end-toend (E2E) SLU which combines ASR and NLU in one model, avoiding the cumulative ASR and NLU errors of the pipeline approach [2], [3]. The main motivation for applying the E2E approach is that word by word recognition is not needed to infer intents. On top of that, the phoneme dictionary and language model (LM) of the ASR become… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 18 publications
0
15
0
Order By: Relevance
“…On top of that, our correlation tests in Section 5.3 showed that perfect ASR is not necessary to obtain good E2E SLU performance. It is, however, essential in the case of a pipeline approach as we have demonstrated in Desot et al (2019b) for intent prediction and in Desot et al (2019a) for concept prediction. This answers our first question and confirms the state-of-the-art: the E2E model reduces the cascade of error effect.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…On top of that, our correlation tests in Section 5.3 showed that perfect ASR is not necessary to obtain good E2E SLU performance. It is, however, essential in the case of a pipeline approach as we have demonstrated in Desot et al (2019b) for intent prediction and in Desot et al (2019a) for concept prediction. This answers our first question and confirms the state-of-the-art: the E2E model reduces the cascade of error effect.…”
Section: Discussionmentioning
confidence: 99%
“…As in Desot et al (2019b), the ASR component of our pipeline SLU is the Kaldi tool, nnet2 version. This neural-network ASR training framework allows training with large amounts of data using multiple GPUs or multi-core machines.…”
Section: Baseline Pipeline Slumentioning
confidence: 99%
“…Broadly speaking, E2ESLU systems have still not superseded the performance of well-trained multi-step SLU systems. While achieving high accuracies without large amounts of training data is hard in general E2E systems, with additional techniques like pre-training [14], artificial data generation [5] or transfer learning [27], a comparable performance to state-of-the-art ASR-NLU pipeline systems can be achieved with E2ESLU [23,10].…”
Section: Spoken Sentence Intent Representationmentioning
confidence: 99%
“…SLU end-to-end systems are usually trained to generate both recognized words and semantic tags [18,15].…”
Section: End-to-end Approachmentioning
confidence: 99%