We present ADVISER 1 -an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), sociallyengaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research.
In this work we take on the challenge of rare word recognition in endto-end (E2E) automatic speech recognition (ASR) by integrating a meta learning mechanism into an E2E ASR system, enabling few-shot adaptation. We propose a novel method of generating embeddings for speech, changes to four meta learning approaches, enabling them to perform keyword spotting and an approach to using their outcomes in an E2E ASR system. We verify the functionality of each of our three contributions in two experiments exploring their performance for different amounts of classes (N-way) and examples per class (k-shot) in a few-shot setting. We find that the information encoded in the speech embeddings suffices to allow the modified meta learning approaches to perform continuous signal spotting. Despite the simplicity of the interface between keyword spotting and speech recognition, we are able to consistently improve word error rate by up to 5%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.