on clean transcriptions whereas ASR transcriptions contain errors reducing the overall performance. Although the pipeline approach is widely adopted, there is a rising interest for end-toend (E2E) SLU which combines ASR and NLU in one model, avoiding the cumulative ASR and NLU errors of the pipeline approach [2], [3]. The main motivation for applying the E2E approach is that word by word recognition is not needed to infer intents. On top of that, the phoneme dictionary and language model (LM) of the ASR become optional. However, E2E approaches are highly dependent on large training data sets which are difficult to acquire, limiting the applicability to new domains where data is scarce which is the case for smart homes.The main contributions of this paper are: 1) the first work on E2E SLU for voice command in a smart home environment; 2) a comparison of a state-of-the-art pipeline approach that predicts intents from the ASR hypothesis and an E2E SLU model; 3) experiments performed with realistic non-English and synthetic data to deal with the paucity of domain specific data sets. Both approaches are positioned with respect to the state-of-the-art in Section II and are outlined in Section III. We tackle the lack of domain-specific data by using Natural Language Generation (NLG) and text-to-speech (TTS) to generate French voice command training data. An overview of these processes and data sets is given in Sections III and IV. Section V presents the results of experiments on a corpus of real smart home voice commands followed by a discussion, conclusion and outlook on future work. II. RELATED WORKSLU is typically seen as a slot-filling task in order to predict the speaker's intent on the one side and entities in a spoken utterance (slots and values) on the other side [1]. The most common approach is a pipeline of an ASR and an NLU module. The ASR system outputs the hypothesis transcriptions from a speech utterance that are analyzed by the NLU module to extract the meaning. While the slot-filling task is most often Abstract-Voice based interaction in a smart home has become a feature of many industrial products. These systems react to voice commands, whether it is for answering a question, providing music or turning on the lights. To be efficient, these systems must be able to extract the intent of the user from the voice command. Intent recognition from voice is typically performed through automatic speech recognition (ASR) and intent classification from the transcriptions in a pipeline. However, the errors accumulated at the ASR stage might severely impact the intent classifier. I n t his p aper, w e p ropose a n End-to-End (E2E) model to perform intent classification directly from the raw speech input. The E2E approach is thus optimized for this specific task and avoids error propagation. Furthermore, prosodic aspects of the speech signal can be exploited by the E2E model for intent classification (e.g., question vs imperative voice). Experiments on a corpus of voice commands acquired in a real smart home reveal t...
Despite growing interest in smart-homes, semantically annotated large voice command corpora for Natural Language development (NLU) are scarce, especially for languages other than English. In this paper, we present an approach to generate customizable synthetic corpora of semantically-annotated French commands for a smart-home. This corpus was used to train three NLU models-a triangular CRF, an attention-based RNN and the Rasa framework-evaluated using a small corpus of real users interacting with a smart home. While the attention model performs best on another large French dataset, on the small smart home corpus the models vary performance across to intent, slot and slot value classification. To the best of our knowledge, no other French corpus of semantically annotated voice commands is currently publicly available
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.