ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414092
|View full text |Cite
|
Sign up to set email alerts
|

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

Abstract: In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved multi-channel time-domain speech separation network which employs speaker embeddings to identify and extract multiple targets without label permutation ambiguity. To efficiently inform the speaker information to the extraction model, we propose a new speaker conditioning mechanism by… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…End-to-end speech separation models take waveforms as both input and output and have made great progress in both singlechannel and multi-channel cases recently [3,[19][20][21][22][23]. The architecture usually consists of an encoder, a decoder and a separator module for estimating a mask for each source.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…End-to-end speech separation models take waveforms as both input and output and have made great progress in both singlechannel and multi-channel cases recently [3,[19][20][21][22][23]. The architecture usually consists of an encoder, a decoder and a separator module for estimating a mask for each source.…”
Section: Methodsmentioning
confidence: 99%
“…For knowledge distillation, we used separation models outperforming the Conv-TasNet in the supervised learning case, while maintaining a smaller model size. The dual-path RNN (DPRNN) [19] and the U-Convolutional block (U-ConvBlock) based multi-channel separation networks [23] have been selected for the single-and the multi-channel tasks, respectively.…”
Section: Separation Network Configurationsmentioning
confidence: 99%
“…However, restrictions on the dynamics of the spectrogram and the utilization of unknown microphones cause performance degradation in these traditional techniques [8]. In the past decade, data-driven methods [9]- [12] relying on deep learning frameworks have yielded tremendous improvements in the performance and robustness of ASR systems compared with the previously proposed methods [3]- [7]. The deep clustering (DPCL) approach [13] was firstly proposed, and its variations [14]- [16] were considered advanced methods to solve source separation problems compared to the other neural networks.…”
Section: Introductionmentioning
confidence: 99%