Hannah Muckenhirn scite author profile

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings;(2) A spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask. Our system significantly reduces the speech recognition WER on multi-speaker signals, with minimal WER degradation on single-speaker signals.

show abstract

Towards Directly Modeling Raw Speech Signal for Speaker Verification Using CNNS

Muckenhirn

Magimai.-Doss

Marcell

2018

View full text Add to dashboard Cite

Speaker verification systems traditionally extract and model cepstral features or filter bank energies from the speech signal. In this paper, inspired by the success of neural networkbased approaches to model directly raw speech signal for applications such as speech recognition, emotion recognition and anti-spoofing, we propose a speaker verification approach where speaker discriminative information is directly learned from the speech signal by: (a) first training a CNN-based speaker identification system that takes as input raw speech signal and learns to classify on speakers (unknown to the speaker verification system); and then (b) building a speaker detector for each speaker in the speaker verification system by replacing the output layer of the speaker identification system by two outputs (genuine, impostor), and adapting the system in a discriminative manner with enrollment speech of the speaker and impostor speech data. Our investigations on Voxforge database shows that this approach can yield systems competitive to state-of-the-art systems. An analysis of the filters in the first convolution layer shows that the filters give emphasis to information in low frequency regions (below 1000 Hz) and implicitly learn to model fundamental frequency information in the speech signal for speaker discrimination.

show abstract

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Wang¹,

Muckenhirn²,

Wilson³

et al. 2018

Preprint

View full text Add to dashboard Cite

Overview of BTAS 2016 speaker anti-spoofing competition

Korshunov

Marcel

Muckenhirn

et al. 2016

View full text Add to dashboard Cite

This paper provides an overview of the Speaker Antispoofing Competition organized by Biometric group at Idiap Research Institute for the IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2016). The competition used AVspoof database, which contains a comprehensive set of presentation attacks, including, (i) direct replay attacks when a genuine data is played back using a laptop and two phones (Samsung Galaxy S4 and iPhone 3G), (ii) synthesized speech replayed with a laptop, and (iii) speech created with a voice conversion algorithm, also replayed with a laptop.The paper states competition goals, describes the database and the evaluation protocol, discusses solutions for spoofing or presentation attack detection submitted by the participants, and presents the results of the evaluation.

show abstract

End-to-End convolutional neural network-based voice presentation attack detection

Muckenhirn

Magimai-Doss

Marcel

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.