Towards End-to-End spoken intent recognition in smart home

Desot, Thierry; Portet, François; Vacher, Michel

doi:10.1109/sped.2019.8906584

Cited by 13 publications

(15 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On top of that, our correlation tests in Section 5.3 showed that perfect ASR is not necessary to obtain good E2E SLU performance. It is, however, essential in the case of a pipeline approach as we have demonstrated in Desot et al (2019b) for intent prediction and in Desot et al (2019a) for concept prediction. This answers our first question and confirms the state-of-the-art: the E2E model reduces the cascade of error effect.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Desot

Portet

Vacher

2022

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

“…As in Desot et al (2019b), the ASR component of our pipeline SLU is the Kaldi tool, nnet2 version. This neural-network ASR training framework allows training with large amounts of data using multiple GPUs or multi-core machines.…”

Section: Baseline Pipeline Slumentioning

confidence: 99%

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Desot

Portet

Vacher

2022

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

“…Broadly speaking, E2ESLU systems have still not superseded the performance of well-trained multi-step SLU systems. While achieving high accuracies without large amounts of training data is hard in general E2E systems, with additional techniques like pre-training [14], artificial data generation [5] or transfer learning [27], a comparable performance to state-of-the-art ASR-NLU pipeline systems can be achieved with E2ESLU [23,10].…”

Section: Spoken Sentence Intent Representationmentioning

confidence: 99%

Low resource end-to-end spoken language understanding with capsule networks

Poncelet

Renkens

hamme

2021

Computer Speech & Language

View full text Add to dashboard Cite

Designing a Spoken Language Understanding (SLU) system for command-and-control applications is challenging. Both Automatic Speech Recognition and Natural Language Understanding are language and application dependent to a great extent. Even with a lot of design effort, users often still have to know what to say to the system for it to do what they want. We propose to use an end-to-end SLU system that maps speech directly to semantics and that can be trained by the user through demonstrations. The user can teach the system a new command by uttering the command and subsequently demonstrating its meaning through an alternative interface. The system will learn the mapping from the spoken command to the task. The dependency on the user also allows different languages and non-standard or impaired speech as valid inputs. Teaching the system requires effort from the user, so it is crucial that the system learns quickly. In this paper we propose to use capsule networks for this task, which are believed to be data efficient. We discuss two architectures for using capsule networks. We analyse their performance and compare them with two baseline systems, one based on Non-negative Matrix Factorisation (NMF) which has been successful for this task and one encoder-decoder approach. We show that in most cases the capsule network performs better than the baseline systems. Furthermore, we demonstrate the versatility of the architecture by inferring speaker identity and the user's word choice through multitask learning.

show abstract

“…SLU end-to-end systems are usually trained to generate both recognized words and semantic tags [18,15].…”

Section: End-to-end Approachmentioning

confidence: 99%

Where are we in semantic concept extraction for Spoken Language Understanding?

Ghannay,

Caubrière,

Mdhaffar

et al. 2021

Preprint

View full text Add to dashboard Cite

Spoken language understanding (SLU) topic has seen a lot of progress these last three years, with the emergence of end-to-end neural approaches. Spoken language understanding refers to natural language processing tasks related to semantic extraction from speech signal, like named entity recognition from speech or slot filling task in a context of human-machine dialogue. Classically, SLU tasks were processed through a cascade approach that consists in applying, firstly, an automatic speech recognition process, followed by a natural language processing module applied to the automatic transcriptions. These three last years, end-toend neural approaches, based on deep neural networks, have been proposed in order to directly extract the semantics from speech signal, by using a single neural model. More recent works on self-supervised training with unlabeled data open new perspectives in term of performance for automatic speech recognition and natural language processing. In this paper, we present a brief overview of the recent advances on the French MEDIA benchmark dataset for SLU, with or without the use of additional data. We also present our last results that significantly outperform the current state-of-the-art with a Concept Error Rate (CER) of 11.2%, instead of 13.6% for the last state-of-the-art system presented this year.

show abstract

Towards End-to-End spoken intent recognition in smart home

Cited by 13 publications

References 18 publications

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Low resource end-to-end spoken language understanding with capsule networks

Where are we in semantic concept extraction for Spoken Language Understanding?

Contact Info

Product

Resources

About