Ning Ma scite author profile

Distant microphone speech recognition systems that operate with humanlike robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally-recorded in a busy domestic environment. The challenge was designed to model the essential difficulties of multisource environment problem while remaining on a scale that would make it accessible to a wide audience. Compared to previous ASR evaluation a particular novelty of the task is that the utterances to be recognised were provided in a continuous audio background rather than as pre-segmented utterances thus allowing a range of background modelling techniques to be employed. The challenge attracted thirteen submissions. This paper describes the challenge problem, provides an overview of the systems that were entered and provides a comparison alongside both a baseline recognition system and human performance. The paper discusses insights gained from the challenge and lessons learnt for the design of future such evaluations.

show abstract

Adaptive Beamforming With Joint Robustness Against Mismatched Signal Steering Vector and Interference Nonstationarity

Vorobyov

Gershman

Luo

et al. 2004

IEEE Signal Process. Lett.

137

View full text Add to dashboard Cite

Adaptive beamforming methods degrade in the presence of both signal steering vector errors and interference nonstationarity. We develop a new approach to adaptive beamforming that is jointly robust against these two phenomena. Our beamformer is based on the optimization of the worst case performance. A computationally efficient convex optimization-based algorithm is proposed to compute the beamformer weights. Computer simulations demonstrate that our beamformer has an improved robustness as compared to other popular robust beamforming algorithms.Index Terms-Adaptive beamforming, interference nonstationarity, robustness, second-order cone programming, steering vector errors.

show abstract

Multi-Sensor Joint Detection and Tracking with the Bernoulli Filter

See

et al. 2012

IEEE Trans. Aerosp. Electron. Syst.

115

View full text Add to dashboard Cite

Robust localisation of multiple speakers exploiting head movements and multi-conditional training of binaural cues

May

Brown

2015

View full text Add to dashboard Cite

This paper addresses the problem of localizing multiple competing speakers in the presence of room reverberation, where sound sources can be positioned at any azimuth on the horizontal plane. To reduce the amount of front-back confusions which can occur due to the similarity of interaural time differences (ITDs) and interaural level differences (ILDs) in the front and rear hemifield, a machine hearing system is presented which combines supervised learning of binaural cues using multi-conditional training (MCT) with a head movement strategy. A systematic evaluation showed that this approach substantially reduced the amount of front-back confusions in challenging acoustic scenarios. Moreover, the system was able to generalize to a variety of different acoustic conditions not seen during training.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ning Ma

Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

The PASCAL CHiME speech separation and recognition challenge

Adaptive Beamforming With Joint Robustness Against Mismatched Signal Steering Vector and Interference Nonstationarity

Multi-Sensor Joint Detection and Tracking with the Bernoulli Filter

Robust localisation of multiple speakers exploiting head movements and multi-conditional training of binaural cues

Contact Info

Product

Resources

About