ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414264
|View full text |Cite
|
Sign up to set email alerts
|

Blind and Neural Network-Guided Convolutional Beamformer for Joint Denoising, Dereverberation, and Source Separation

Abstract: This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractable, we incorporate two techniques into the approach: the Source-Wise Factorization (SW-Fact) of a CBF and the Indep… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 17 publications
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…Possible integration solutions include: a) a pipelined architecture within which the speech separation and dereverberation components are sequentially connected in any order such as the previous researches in [21], [48], [68]; or b) a single architecture where both these two enhancement functions are implemented, for example, using weighted power minimization distortionless response (WPD) [69]- [71] and the related DNN TF-mask based WPD [72], [73] approaches. To date, such integration problem has only been investigated for audio-only speech enhancement [21], [69]- [77], but has not been studied for audio-visual speech separation and dereverberation.…”
Section: Introductionmentioning
confidence: 99%
“…Possible integration solutions include: a) a pipelined architecture within which the speech separation and dereverberation components are sequentially connected in any order such as the previous researches in [21], [48], [68]; or b) a single architecture where both these two enhancement functions are implemented, for example, using weighted power minimization distortionless response (WPD) [69]- [71] and the related DNN TF-mask based WPD [72], [73] approaches. To date, such integration problem has only been investigated for audio-only speech enhancement [21], [69]- [77], but has not been studied for audio-visual speech separation and dereverberation.…”
Section: Introductionmentioning
confidence: 99%