ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414868
|View full text |Cite
|
Sign up to set email alerts
|

Combining Adaptive Filtering And Complex-Valued Deep Postfiltering For Acoustic Echo Cancellation

Abstract: In this contribution, we introduce a novel approach to noise-robust acoustic echo cancellation employing a complex-valued Deep Neural Network (DNN) for postfiltering. In a first step, early linear echo components are removed using a double-talk robust adaptive filter. The residual signal is subsequently processed by the proposed postfilter (PF). Due to its complex-valued nature, the PF allows to suppress unwanted signal components without introducing distortions to the near-end speaker. For training and evalua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 27 publications
1
12
0
Order By: Relevance
“…In [2] a similar idea was followed, however, only for a very small subset of the possible combinations, a rather simple feedforward neural network, and without explicit consideration of noise reduction capabilities. Furthermore, an intuitive and often seen input signal combination of a RES/NR network is the pair of enhanced signal E (k) and reference signal X (k), which delivers state of the art performance, e.g., in [4,9]. But as the reference signal does not contain any information about the room characteristics yet, could the estimated echo signal D (k) be a better choice?…”
Section: Experimental Designmentioning
confidence: 99%
See 1 more Smart Citation
“…In [2] a similar idea was followed, however, only for a very small subset of the possible combinations, a rather simple feedforward neural network, and without explicit consideration of noise reduction capabilities. Furthermore, an intuitive and often seen input signal combination of a RES/NR network is the pair of enhanced signal E (k) and reference signal X (k), which delivers state of the art performance, e.g., in [4,9]. But as the reference signal does not contain any information about the room characteristics yet, could the estimated echo signal D (k) be a better choice?…”
Section: Experimental Designmentioning
confidence: 99%
“…In this typical two-stage arrangement for speech enhancement, the application of a DNN as second stage (RES and NR) gained increasing attention with early investigations of feed-forward networks [1,2], convolutional networks bringing further improvements more recently [3,4], some even being fully synergistic with the first stage [5], and many more. In the meantime, also fully learned deep AEC approaches were proposed, where a single network incorporates the tasks of AEC, RES, and NR, e.g., [6,7] or further investigated in [8].…”
Section: Introductionmentioning
confidence: 99%
“…Since TIMIT dataset was widely used in literatures to evaluate AEC performance, we follows the data preparation method as is referred to in [17,19], resulting in 3500 training mixtures, and 300 test mixtures. The generation for inputs and labels are illustrated in Fig.…”
Section: Data Preparationmentioning
confidence: 99%
“…Four categories are obtained, i.e., no voice exist at both branches (i.e., silence), only voice exists at the near-end (i.e., near-end single-talk), only voice exists at the far-end (i.e., far-end single-talk), voice exist at both ends (i.e., double-talk), acting as classification labels. The RIR in the experiments are configured as is referred to in [17,19], resulting in 7 RIRs, of which the first 6 RIRs are used to generate training mixtures and the last one is used to generate test mixtures. The hard clipping is used to simulate the power amplifier of loudspeaker, and the memoryless sigmoidal function is applied to emulate the nonlinear characteristic of loudspeaker, resulting in x nl (n) for nonlinear inputs.…”
Section: Data Preparationmentioning
confidence: 99%
See 1 more Smart Citation