New Era for Robust Speech Recognition 2017
DOI: 10.1007/978-3-319-64680-0_3
|View full text |Cite
|
Sign up to set email alerts
|

Multichannel Spatial Clustering Using Model-Based Source Separation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 47 publications
0
2
0
Order By: Relevance
“…The rationale lies in that, when sources are (W-) disjoint orthogonal, the binaural cues will form clusters within each frequency band for spatially separated directional sources with different time delays. This is also the theoretical basis of spatial clustering technique [65], [66]. These spatial cues have been proven effective in deep learning-based frequency domain separation methods, especially when combined with spectral feature (e.g., logarithm power spectra, LPS) at input level [16], [19], [36], [40], [42].…”
Section: Introductionmentioning
confidence: 98%
“…The rationale lies in that, when sources are (W-) disjoint orthogonal, the binaural cues will form clusters within each frequency band for spatially separated directional sources with different time delays. This is also the theoretical basis of spatial clustering technique [65], [66]. These spatial cues have been proven effective in deep learning-based frequency domain separation methods, especially when combined with spectral feature (e.g., logarithm power spectra, LPS) at input level [16], [19], [36], [40], [42].…”
Section: Introductionmentioning
confidence: 98%
“…When multiple microphones are available, spatial information can be leveraged for better separation, as speaker sources are directional and usually spatially separated in realworld environments. One stream of research to exploit this information is focused on spatial clustering [10], [11], [12], [13], [14], which clusters individual T-F units according to their spatial origins under the speech sparsity assumption [8], [15], using spatial cues such as interchannel time, phase or level differences (ITDs/IPDs/ILDs) and directional statistics. However, these approaches typically only consider spatial information, which is insufficient for separation in reverberant environments or when sound sources are close to one another.…”
Section: Introductionmentioning
confidence: 99%