Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models. This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages. Using nine Indic languages, we compare a variety of techniques, and find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model. The resulting E2E multilingual model achieves a lower word error rate (WER) than both monolingual E2E models (eight of nine languages) and monolingual conventional systems (all nine languages).
For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units.In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine phoneme-based end-to-end models, which are contrasted against grapheme-based ones on a large vocabulary English Voice-search task, where we find that graphemes do indeed outperform phonemes. We also compare grapheme and phoneme-based approaches on a multi-dialect English task, which once again confirm the superiority of graphemes, greatly simplifying the system for recognizing multiple dialects.
Brand extension is a marketing strategy to apply the previously established brand name into new goods or service. A number of studies have reported the characteristics of human event-related potentials (ERPs) in response to the evaluation of goods-to-goods brand extension. In contrast, human brain responses to the evaluation of service extension are relatively unexplored. The aim of this study was investigating cognitive processes underlying the evaluation of service-to-service brand extension with electroencephalography (EEG). A total of 56 text stimuli composed of service brand name (S1) followed by extended service name (S2) were presented to participants. The EEG of participants was recorded while participants were asked to evaluate whether a given brand extension was acceptable or not. The behavioral results revealed that participants could evaluate brand extension though they had little knowledge about the extended services, indicating the role of brand in the evaluation of the services. Additionally, we developed a method of grouping brand extension stimuli according to the fit levels obtained from behavioral responses, instead of grouping of stimuli a priori. The ERP analysis identified three components during the evaluation of brand extension: N2, P300, and N400. No difference in the N2 amplitude was found among the different levels of a fit between S1 and S2. The P300 amplitude for the low level of fit was greater than those for higher levels (p < 0.05). The N400 amplitude was more negative for the mid- and high-level fits than the low level. The ERP results of P300 and N400 indicate that the early stage of brain extension evaluation might first detect low-fit brand extension as an improbable target followed by the late stage of the integration of S2 into S1. Along with previous findings, our results demonstrate different cognitive evaluation of service-to-service brand extension from goods-to-goods.
It has become increasingly important to monitor drivers’ negative emotions during driving to prevent accidents. Despite drivers’ anxiety being critical for safe driving, there is a lack of systematic approaches to detect anxiety in driving situations. This study employed multimodal biosignals, including electroencephalography (EEG), photoplethysmography (PPG), electrodermal activity (EDA) and pupil size to estimate anxiety under various driving situations. Thirty-one drivers, with at least one year of driving experience, watched a set of thirty black box videos including anxiety-invoking events, and another set of thirty videos without them, while their biosignals were measured. Then, they self-reported anxiety-invoked time points in each video, from which features of each biosignal were extracted. The logistic regression (LR) method classified single biosignals to detect anxiety. Furthermore, in the order of PPG, EDA, pupil, and EEG (easiest to hardest accessibility), LR classified accumulated multimodal signals. Classification using EEG alone showed the highest accuracy of 77.01%, while other biosignals led to a classification with accuracy no higher than the chance level. This study exhibited the feasibility of utilizing biosignals to detect anxiety invoked by driving situations, demonstrating benefits of EEG over other biosignals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.