Raffaele Tavarone scite author profile

The first step in Automatic Speech Recognition (ASR) is a fixed-rate segmentation of the acoustic signal into overlapping windows of fixed length. Although this procedure allows to achieve excellent recognition accuracy, it is far from being computationally efficient, in that it may produce a highly redundant signal (i.e, almost identical spectral vectors may span many observation windows) that converts into computational overload. The reduction of such overload can be very beneficial for application such as offline ASR on mobile devices.In this paper we present a principled way for saving numerical operations during ASR by using conditional-computation methods in deep bidirectional Recurrent Neural Networks (RNNs) for acoustic modelling. The methods rely on learned binary neurons that allow hidden layers to be updated only when necessary or to keep their previous value.We (i) evaluate, for the first time, conditional computationbased recurrent architectures on a speech recognition task, and (ii) propose a novel model specifically designed for speech data that inherently builds a multi-scale temporal structure in the hidden layers. Results on the TIMIT dataset show that conditional mechanisms in recurrent architectures can reduce hidden layer updates up to 40% at the cost of about 20% relative phone error rate increase. Index Terms: speech recognition, computational efficiency, conditional computation, recurrent neural network.

show abstract

Improving Generalization of Vocal Tract Feature Reconstruction: From Augmented Acoustic Inversion to Articulatory Feature Reconstruction without Articulatory Data

Turrisi

Tavarone

Badino

2018

View full text Add to dashboard Cite

We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement. Results show that articulatory features generated by this approach can correlate up to 0.59 Pearson's product-moment correlation with measured articulatory features.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Raffaele Tavarone

Energy and Computation Efficient Audio-Visual Voice Activity Detection Driven by Event-Cameras

Conditional-Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling

Improving Generalization of Vocal Tract Feature Reconstruction: From Augmented Acoustic Inversion to Articulatory Feature Reconstruction without Articulatory Data

Contact Info

Product

Resources

About