2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952155
|View full text |Cite
|
Sign up to set email alerts
|

Deep attractor network for single-microphone speaker separation

Abstract: Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins correspondin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

3
325
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 381 publications
(328 citation statements)
references
References 15 publications
3
325
0
Order By: Relevance
“…permutation during model training. Numerous extentions to these methods were proposed with different focuses [3,4,5,6,7,8].…”
Section: Introductionmentioning
confidence: 99%
“…permutation during model training. Numerous extentions to these methods were proposed with different focuses [3,4,5,6,7,8].…”
Section: Introductionmentioning
confidence: 99%
“…More recent attempts to build computational models of sound segregation similarly focus on the intuitively plausible cue of temporal coincidence [24,25]. Current state-of-the-art engineering methods instead rely on learning how to group acoustic energy from labeled sound mixtures [44,45], but are at present difficult to probe for insight into the underlying acoustic dependencies. Our methodology falls between these two traditions, utilizing the rich set of constraints imposed by natural signals but providing interpretable insight into factors that might underlie grouping.…”
Section: Related Workmentioning
confidence: 99%
“…These vectors are then used to generate masks to filter out the individual speakers from the mixture. By using these intermediate embedding vectors instead of directly outputting the masks, the so called permutation problem [11] is avoided. This paper is organised as follows.…”
Section: Introductionmentioning
confidence: 99%