2018
DOI: 10.1007/s10772-018-9520-y
|View full text |Cite
|
Sign up to set email alerts
|

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Abstract: Voice command in multi-room smart homes for assisting people in loss of autonomy in their daily activities faces several challenges, one of them being the distant condition which impacts ASR performance. This paper presents an overview of multiple techniques for fusion of multi-source audio (pre, middle, post fusion) for automatic speech recognition for in-home voice command. The robustness of the models of speech is obtained by adaptation to the environment and to the task. Experiments are based on several pu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 48 publications
0
2
0
1
Order By: Relevance
“…Word error rates (WER) in Table 3 show that the fMLLR and HMM-DNN models with the ESLO2 data outperform the acoustic models without it. The WER is slightly superior than a recent study using a similar approach for French voice command recognition in a smart home [35]. The NLU seq2seq model was a bi-directional LSTM en-coder and decoder.…”
Section: Pipeline Slu Baseline Approachmentioning
confidence: 90%
“…Word error rates (WER) in Table 3 show that the fMLLR and HMM-DNN models with the ESLO2 data outperform the acoustic models without it. The WER is slightly superior than a recent study using a similar approach for French voice command recognition in a smart home [35]. The NLU seq2seq model was a bi-directional LSTM en-coder and decoder.…”
Section: Pipeline Slu Baseline Approachmentioning
confidence: 90%
“…Par ailleurs, la commande vocale dans l'habitat étant résolument multicanal, le système décodait les évènements des deux canaux de plus fort RSB l'un après l'autre. Le décodage d'un canal permettait de pondérer le modèle de langage du système pour le décodage du second en employant une technique appelée Driven Decoding Algorithm (DDA) [44,45]. Ce choix a été fait suite à une étude envisageant les méthodes possibles ayant un faible coût calculatoire.…”
Section: Rap En Condition Distanteunclassified
“…They found that the place features are most susceptible to misperceptions in white noise, followed by manner features, and then voicing features. Interactions among the IoT devices in smart homes can be executed via user interaction techniques such as human speech [12,13].…”
Section: Related Workmentioning
confidence: 99%