Hassan Taherian scite author profile

Despite successful applications of multi-channel signal processing in robust automatic speech recognition (ASR), relatively little research has been conducted on the effectiveness of such techniques in the robust speaker recognition domain. This paper introduces time-frequency (T-F) maskingbased beamforming to address text-independent speaker recognition in conditions where strong diffuse noise and reverberation are both present. We examine various masking-based beamformers, such as parameterized multi-channel Wiener filter, generalized eigenvalue (GEV) beamformer and minimum variance distortion-less response (MVDR) beamformer, and evaluate their performance in terms of speaker recognition accuracy for i-vector and x-vector based systems. In addition, we present a different formulation for estimating steering vectors from speech covariance matrices. We show that rank-1 approximation of a speech covariance matrix based on generalized eigenvalue decomposition leads to the best results for the masking-based MVDR beamformer. Experiments on the recently introduced NIST SRE 2010 retransmitted corpus show that the MVDR beamformer with rank-1 approximation provides an absolute reduction of 5.55% in equal error rate compared to a standard masking-based MVDR beamformer.

show abstract

Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training

Taherian

Tan

Wang

2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement

Taherian¹,

Eskimez²,

Yoshioka³

et al. 2022

View full text Add to dashboard Cite

Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility

Healy

Johnson

Delfarah

et al. 2021

View full text Add to dashboard Cite

The practical efficacy of deep learning based speaker separation and/or dereverberation hinges on its ability to generalize to conditions not employed during neural network training. The current study was designed to assess the ability to generalize across extremely different training versus test environments. Training and testing were performed using different languages having no known common ancestry and correspondingly large linguistic differences—English for training and Mandarin for testing. Additional generalizations included untrained speech corpus/recording channel, target-to-interferer energy ratios, reverberation room impulse responses, and test talkers. A deep computational auditory scene analysis algorithm, employing complex time-frequency masking to estimate both magnitude and phase, was used to segregate two concurrent talkers and simultaneously remove large amounts of room reverberation to increase the intelligibility of a target talker. Significant intelligibility improvements were observed for the normal-hearing listeners in every condition. Benefit averaged 43.5% points across conditions and was comparable to that obtained when training and testing were performed both in English. Benefit is projected to be considerably larger for individuals with hearing impairment. It is concluded that a properly designed and trained deep speaker separation/dereverberation network can be capable of generalization across vastly different acoustic environments that include different languages.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hassan Taherian

Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement

Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments

Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement

Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility

Contact Info

Product

Resources

About