Maurizio Omologo scite author profile

A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on Recurrent Neural Networks (RNNs), that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals.In this paper, we revise one of the most popular RNN models, namely Gated Recurrent Units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is two-fold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with ReLU activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues.Results show that the proposed architecture, called Light GRU (Li-GRU), not only reduces the per-epoch training time by more than 30% over a standard GRU, but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to endto-end CTC models.

show abstract

Automatic segmentation and labeling of speech based on Hidden Markov Models

Brugnara

Falavigna

Omologo

1993

Speech Communication

134

View full text Add to dashboard Cite

CLEAR Evaluation of Acoustic Event Detection and Classification Systems

Temko¹,

Malkin

Zieger

et al.

View full text Add to dashboard Cite

Acoustic event localization using a crosspower-spectrum phase based technique

Omologo

Svaizer

182

View full text Add to dashboard Cite

Use of the crosspower-spectrum phase in acoustic event location

Omologo

Svaizer²

1997

IEEE Trans. Speech Audio Process.

182

View full text Add to dashboard Cite

This correspondence reports on the use of crosspowerspectrum phase (CSP) analysis as an accurate time delay estimation (TDE) technique. It is used in a microphone array system for the location of acoustic events in noisy and reverberant environments. A corresponding coherence measure (CM) and its graphical representation are introduced to show TDE accuracy. Using a two-microphone pair array, real experiments show less than 10 cm average location error in a 6 m 2 6 m area.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.