Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1940
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Abstract: Deep clustering (DC) and utterance-level permutation invariant training (uPIT) have been demonstrated promising for speakerindependent speech separation. DC is usually formulated as two-step processes: embedding learning and embedding clustering, which results in complex separation pipelines and a huge obstacle in directly optimizing the actual separation objectives. As for uPIT, it only minimizes the chosen permutation with the lowest mean square error, doesn't discriminate it with other permutations. In thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 21 publications
0
19
0
Order By: Relevance
“…Deep clustering approaches have been proposed and showed success in separation tasks [5], [14]. Many techniques reported consisted of multiple stages separately optimized under different criteria, such as signal representation and embedding [15]. Some embedding clustering methods add phase information on multi-channel, and other research concerns the low-delay of deep clustering approaches [16], [17].…”
Section: Related Work a Speech Separation Based On Deep Clusteringmentioning
confidence: 99%
“…Deep clustering approaches have been proposed and showed success in separation tasks [5], [14]. Many techniques reported consisted of multiple stages separately optimized under different criteria, such as signal representation and embedding [15]. Some embedding clustering methods add phase information on multi-channel, and other research concerns the low-delay of deep clustering approaches [16], [17].…”
Section: Related Work a Speech Separation Based On Deep Clusteringmentioning
confidence: 99%
“…uPIT suffers from permutation problem. To overcome the issue of permutation authors proposed Deep Clustering (DC) with uPIT [45,46]. DC, Deep Attractor Network [47,] and uPIT can predict the assignments at the utterance level of all TF bins at once, without the need for frame based assignment, which is the main cause of the permutation problem.…”
Section: Imentioning
confidence: 99%
“…Qualitative comparison is drawn with other models using source-to-distortion ratio (SDR) [44]. Other measurement measures include signal-to-distortion ratio improvement (SI-SDR) [45], perceptual estimate of speech efficiency (PESQ) scores [46], scale-invariant signal to-noise ratio (SI-SNR) [47]. SDR, SI-SDR, PESQ, SI-SNR higher values reflect better quality of separation.…”
Section: Algorithm 3: Prepare Testing Setmentioning
confidence: 99%
“…Also U-Net-based approaches provide competitive results in this context, both for monaural [6,7] and multichannel SE tasks [8], at the expense of higher computational power demand. Other techniques to perform SE include recurrent neural networks (RNNs) [9], graph-based spectral subtraction [10], discriminative learning [11], dilated convolutions [12].…”
Section: Introductionmentioning
confidence: 99%