2020
DOI: 10.48550/arxiv.2009.03141
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An End-to-end Architecture of Online Multi-channel Speech Separation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 24 publications
0
6
0
Order By: Relevance
“…After figuring out all candidate N A angles and N B beams from subnet AF and BF as well as unmixing mask M U c ∈ C T ×F , c ∈ {1, 2}, we utilize the proposed attentional selection mechanism from [17] for multi-channel feature selection.…”
Section: Attentional Feature Selectionmentioning
confidence: 99%
See 2 more Smart Citations
“…After figuring out all candidate N A angles and N B beams from subnet AF and BF as well as unmixing mask M U c ∈ C T ×F , c ∈ {1, 2}, we utilize the proposed attentional selection mechanism from [17] for multi-channel feature selection.…”
Section: Attentional Feature Selectionmentioning
confidence: 99%
“…Another popular way is to form a cascaded model or two-stage system following the enhancement-separation [13,14] or separationenhancement [15,16] scheme. In [17], an end-to-end structure following the separation-enhancement processing is proposed, which enables the joint optimization of the speech unmixing and extraction and yield impressive improvement in online scenario. Besides, the work of [18] is proposed to increase the robustness of noisy speech separation under the recursive separation framework [11,19] and studies in [20,21] continue to consider the practical issue of recursive approach in real meeting scenarios.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…With the advent of deep learning in recent years, blind speech separation methods have been widely studied to solve the cocktail party problem by applying neural networks [6][7][8][9] and beamforming [10][11][12][13]. The neural network seeks the regular patterns (i.e., masks) between the time-frequency representation of the target speech and mixture speech, while beamforming incorporates the spatial statistics (i.e., spatial covariance matrix) obtained from the estimated masks to compute beamformer's weights and filter the desired voice.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the multi-channel speech separation achieves good performance [13,14] and has been successfully integrated into conversation transcription systems [15]. However, the improvement has still been limited with single channel input for the conversational tasks [16,17,18].…”
Section: Introductionmentioning
confidence: 99%