Marc Ferras Font scite author profile

Marc Ferras Font

3Publications

5Citation Statements Received

57Citation Statements Given

How they've been cited

How they cite others

Affiliations

Idiap Research Institute

Publications

Order By: Most citations

Towards utterance-based neural network adaptation in acoustic modeling

Himawan

Motlíček

Font

et al. 2015

View full text Add to dashboard Cite

Despite the superior classification ability of deep neural networks (DNN), the performance of DNN suffers when there is a mismatch between training and testing conditions. Many speaker adaptation techniques have been proposed for DNN acoustic modeling but in case of environmental robustness the progress is still limited. It is also possible to use techniques developed for adapting speakers to handle the impact of environments at the same time, or to combine both approaches. Directly adapting the large number of DNN parameters is challenging when the adaptation set is small. The learning hidden unit contributions (LHUC) technique for unsupervised speaker adaptation of DNN introduces speaker dependent parameters to the existing speaker independent network to increase the automatic speech recognition (ASR) performance of the target speaker using small amounts of adaptation data. This paper investigates the LHUC to adapt the speech recognizer to target speakers and environments where the impacts of speakers and noise differences are quantified separately. Our finding shows that the LHUC is capable of adapting to both speaker and noise conditions at the same time. Compared to the speaker independent model, about 9% to 13% relative word error rate (WER) improvement are observed for all test conditions using AMI meeting corpus.

show abstract

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Aroudi¹,

Uhlich²,

Font³

2021

Preprint

View full text Add to dashboard Cite

In recent years, many deep learning techniques for single-channel sound source separation have been proposed using recurrent, convolutional and transformer networks. When multiple microphones are available, spatial diversity between speakers and background noise in addition to spectro-temporal diversity can be exploited by using multi-channel filters for sound source separation. Aiming at end-to-end multi-channel source separation, in this paper we propose a transformer-recurrent-U network (TRUNet), which directly estimates multi-channel filters from multi-channel input spectra. TRUNet consists of a spatial processing network with an attention mechanism across microphone channels aiming at capturing the spatial diversity, and a spectro-temporal processing network aiming at capturing spectral and temporal diversities. In addition to multi-channel filters, we also consider estimating single-channel filters from multi-channel input spectra using TRUNet. We train the network on a large reverberant dataset using a combined compressed mean-squared error loss function, which further improves the sound separation performance. We evaluate the network on a realistic and challenging reverberant dataset, generated from measured room impulse responses of an actual microphone array. The experimental results on realistic reverberant sound source separation show that the proposed TRUNet outperforms state-ofthe-art single-channel and multi-channel source separation methods.

show abstract

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Aroudi¹,

Uhlich²,

Font³

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marc Ferras Font

Towards utterance-based neural network adaptation in acoustic modeling

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Contact Info

Product

Resources

About