ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054177
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

Abstract: An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones. The former requires the system to be invariant to different indexing of the microphones with the same locations, while the latter requires the system to be able to process inputs with varying dimensions. Conventional optimizationbased beamforming techniques satisfy these requirements by definition, while for deep learning-based end-to-end systems t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
85
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 147 publications
(85 citation statements)
references
References 25 publications
0
85
0
Order By: Relevance
“…We use the same dataset proposed in [57] for the singlechannel noisy reverberant speech separation task. The simulated dataset contains 20000, 5000 and 3000 4-second long utterances sampled at 16 kHz sample rate for training, validation and test sets, respectively.…”
Section: Experiments Configurations a Data Simulationmentioning
confidence: 99%
See 1 more Smart Citation
“…We use the same dataset proposed in [57] for the singlechannel noisy reverberant speech separation task. The simulated dataset contains 20000, 5000 and 3000 4-second long utterances sampled at 16 kHz sample rate for training, validation and test sets, respectively.…”
Section: Experiments Configurations a Data Simulationmentioning
confidence: 99%
“…1) Transform-average-concatenate (TAC) [57]: TAC was proposed for the multichannel speech separation task with ad-hoc microphone arrays where no microphone indexing or geometry information is known in advance. The design particularly matches our need in the GroupComm module where "group indices", i.e., the sequential order of the features in different groups, does not exist.…”
Section: B Model Configurations 1) Separation Pipeline Configurationsmentioning
confidence: 99%
“…We evaluate our approach on a simulated noisy reverberant twospeaker dataset [31]. 20000, 5000 and 3000 4-second long utterances are simulated for training, validation and test sets, respectively.…”
Section: Datasetmentioning
confidence: 99%
“…We evaluate the different model configurations on a simulated noisy reverberant two-speaker dataset [27]. 20000, 5000 and 3000 4second long utterances are simulated for training, validation and test sets, respectively.…”
Section: Datasetmentioning
confidence: 99%