2015
DOI: 10.1016/j.specom.2015.01.006
|View full text |Cite
|
Sign up to set email alerts
|

Distant speech separation using predicted time–frequency masks from spatial features

Abstract: Speech separation algorithms are faced with a difficult task of producing high degree of separation without containing unwanted artifacts. The time-frequency (T-F) masking technique applies a real-valued (or binary) mask on top of the signal's spectrum to filter out unwanted components. The practical difficulty lies in the mask estimation. Often, using efficient masks engineered for separation performance leads to presence of unwanted musical noise artifacts in the separated signal. This lowers the perceptual … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(28 citation statements)
references
References 53 publications
(56 reference statements)
0
28
0
Order By: Relevance
“…However, when the sources are too close for the beamformer to discriminate (25˚), the mask returned by the network is of no help for the multichannel filter, which performs as badly as the simple beamformer and barely improves compared to the mixture. This can be overcome by feeding the network witĥ n 1 (11). When the speakers are 90˚apart, this does not significantly improve the already good performance of the system.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, when the sources are too close for the beamformer to discriminate (25˚), the mask returned by the network is of no help for the multichannel filter, which performs as badly as the simple beamformer and barely improves compared to the mixture. This can be overcome by feeding the network witĥ n 1 (11). When the speakers are 90˚apart, this does not significantly improve the already good performance of the system.…”
Section: Resultsmentioning
confidence: 99%
“…a mask [8][9][10]. In the multichannel case, several approaches have been proposed to pass spatial information directly to a DNN, for instance using phase difference features between non-coincident microphones [11] or coherence features [12]. However, in these two studies, the mask estimated by the DNN is still applied as a single-channel filter only.…”
Section: Introductionmentioning
confidence: 99%
“…In this direction, further improvements could be obtained by removing the gaussianity assumption and employing more articulated modeling of the phase distribution. For instance, following recent trends in multi-channel enhancement, neural networks could be used [23,24,25]. A further limitation of the proposed approach is the anechoic phase modeling in Eq.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, it cannot be applied when the sources are spatially close to each other. Conventional postfiltering techniques, which are mainly based on signal statistics and conventional single-channel speech enhancement [5], [1] or spatial filters computed using phase information [6], [7], [8], [1], usually cannot achieve high-quality noise reduction in reverberant multi-source environments.…”
Section: Introductionmentioning
confidence: 99%