2019
DOI: 10.48550/arxiv.1904.03065
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Recursive speech separation for unknown number of speakers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(14 citation statements)
references
References 18 publications
0
14
0
Order By: Relevance
“…Thanks to the rapid development of deep learning in recent years, speech enhancement, separation and dereverberation have received remarkable progress and consistent improvements on public datasets, e.g., WSJ0-2mix [1], VoiceBank-DEMAND [2] and REVERB Challenge [3]. Various deep neural network (DNN) architectures are proposed and reported to achieve significant improvements on speech enhancement [4,5,6] or separation [7,8,9,10,11] tasks. However, most of the models mentioned above focus on individual enhancement or separation task and haven't considered the real-world environment that speech overlapping, directional/isotropic noise and reverberation may exist together, which leads us to consider adopting one universal model to cope with speech enhancement, separation and dereverberation simultaneously.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Thanks to the rapid development of deep learning in recent years, speech enhancement, separation and dereverberation have received remarkable progress and consistent improvements on public datasets, e.g., WSJ0-2mix [1], VoiceBank-DEMAND [2] and REVERB Challenge [3]. Various deep neural network (DNN) architectures are proposed and reported to achieve significant improvements on speech enhancement [4,5,6] or separation [7,8,9,10,11] tasks. However, most of the models mentioned above focus on individual enhancement or separation task and haven't considered the real-world environment that speech overlapping, directional/isotropic noise and reverberation may exist together, which leads us to consider adopting one universal model to cope with speech enhancement, separation and dereverberation simultaneously.…”
Section: Introductionmentioning
confidence: 99%
“…In [17], an end-to-end structure following the separation-enhancement processing is proposed, which enables the joint optimization of the speech unmixing and extraction and yield impressive improvement in online scenario. Besides, the work of [18] is proposed to increase the robustness of noisy speech separation under the recursive separation framework [11,19] and studies in [20,21] continue to consider the practical issue of recursive approach in real meeting scenarios. The authors of [22] investigate the DNN supported system that integrates conventional spatial-clustering and beamforming techniques.…”
Section: Introductionmentioning
confidence: 99%
“…When initialized with a well trained separation model using feature recovery objective (6), the permutation computation logic is not necessary for ASR objective, as shown in (7). For each training sample, we determine the label permutation by measuring the distance between Mc ⊙ |Y| and spectrum of the reference signal |Xc|, c ∈ {0, 1}.…”
Section: Network Trainingmentioning
confidence: 99%
“…Both of them are trained on 960 hours of LibriSpeech training data, using the word piece units of the transcription as target. The first ASR, named ASRmatched, is the ASR model used for ASR-oriented training of speech separation following (7). It consists of 12 Conformer [18] encoder layers and 6 Transformer decoder layers with 80-dimentional log fbank as the input feature.…”
Section: Asr Modelmentioning
confidence: 99%
See 1 more Smart Citation