2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952154
|View full text |Cite
|
Sign up to set email alerts
|

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Abstract: We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the longlasting label permutation problem, that has prevented progress on deep learning based techniques for speech separation. W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
557
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 736 publications
(558 citation statements)
references
References 21 publications
1
557
0
Order By: Relevance
“…Two deep learning based methods have been proposed to resolve these problems, which are known as ”deep clustering (DC)[8]” and ”permutation invariant training (PIT)[9]”. In deep clustering, a network is trained to generate discriminative embedding for each time-frequency (T-F) bin with points belonging to the same source forced to be closer to each other.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Two deep learning based methods have been proposed to resolve these problems, which are known as ”deep clustering (DC)[8]” and ”permutation invariant training (PIT)[9]”. In deep clustering, a network is trained to generate discriminative embedding for each time-frequency (T-F) bin with points belonging to the same source forced to be closer to each other.…”
Section: Introductionmentioning
confidence: 99%
“…permutations), and use the permutation with lowest error to update the network. PIT was first proposed in [8], and was later shown to have comparable performance as DC [9]. However, PIT approach suffers the output dimension mismatch problem because it assumes a fixed number of sources.…”
Section: Introductionmentioning
confidence: 99%
“…Deep clustering based separation techniques that use spectrogram embeddings were proposed in [4,5,6]. Permutation invariant training (PIT) [7] was also developed as a general * This work was done while the author was an intern at Tencent AI. solution to map single channel, monaural mixed speech inputs to those of individual speakers.…”
Section: Introductionmentioning
confidence: 99%
“…The batch size is set to 32. Permutation invariant training [17] is adopted to tackle with label permutation problem.…”
Section: Network and Training Detailsmentioning
confidence: 99%