Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1550
|View full text |Cite
|
Sign up to set email alerts
|

Recursive Speech Separation for Unknown Number of Speakers

Abstract: In this paper we propose a method of single-channel speakerindependent multi-speaker speech separation for an unknown number of speakers. As opposed to previous works, in which the number of speakers is assumed to be known in advance and speech separation models are specific for the number of speakers, our proposed method can be applied to cases with different numbers of speakers using a single model by recursively separating a speaker. To make the separation model recursively applicable, we propose one-and-re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
53
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 69 publications
(53 citation statements)
references
References 26 publications
0
53
0
Order By: Relevance
“…It sometimes confidently extracts or tracks two different speakers' signals with one speaker embedding vector {z b,i } 1≤b≤B , probably because their voice characteristics are similar from the system's point of view. This type of error should be reduced by, for example, employing more advanced NN architecture [18], and increasing the number of speakers in training data like we propose in [29].…”
Section: Discussionmentioning
confidence: 99%
“…It sometimes confidently extracts or tracks two different speakers' signals with one speaker embedding vector {z b,i } 1≤b≤B , probably because their voice characteristics are similar from the system's point of view. This type of error should be reduced by, for example, employing more advanced NN architecture [18], and increasing the number of speakers in training data like we propose in [29].…”
Section: Discussionmentioning
confidence: 99%
“…A major design choice in music source separation models is whether to (1) train a separate model for each instrument [12], (2) to use a single class-conditional model, or (3) to use an instrument agnostic approach [16]. Our approach aims to combine the advantages of the first two; the high-precision of independent models, with improved optimization via parameter sharing in single models.…”
Section: Related Workmentioning
confidence: 99%
“…A few earlier works have proposed to iteratively estimate the sources using deep neural networks in a single-channel setting [2,12]. To the best of our knowledge, this paper is the first such study in a multichannel setting, where we estimate both the DOAs and the masks of all speakers.…”
Section: Introductionmentioning
confidence: 99%