Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement

Tesch, Kristina; Gerkmann, Timo

doi:10.1109/taslp.2022.3221046

Cited by 39 publications

(16 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(4) Narrow-band Net [11] is one of our previous works that uses two layers of LSTM to only exploit the narrow-band spatial information (like the second module of the proposed network). ( 5) FT-JNF [14]: Kristina et al revised the Narrow-band Net [11] by replacing the first LSTM with an along-frequency LSTM to further exploit the fullband information (like the first and second modules together of the proposed network).…”

Section: Methodsmentioning

confidence: 99%

“…The first two modules together follow a similar spirit as the FT-JNF network [14]. The major difference is that, besides the output of module 1, we also feed the narrow-band noisy signals to the second module, as the first module may lose some narrow-band information.…”

Section: Narrow-band Spatial Modulementioning

confidence: 99%

“…The basic strategy is to exploit each type of information with a dedicated network. The single-channel FullSubNet [2] and multichannel FT-JNF network [14] have proved that this strategy is effective for fusing different types of information. Specifically, the proposed multi-cue fusion network (named McNet) cascades four modules, including a full-band spatial, narrow-band spatial, sub-band spectral and full-band spectral module.…”

Section: Introductionmentioning

confidence: 98%

“…Each module uses one layer of the LSTM network, by which the temporal dynamic of each type of information can be properly modeled. Compared with two SOTA networks, i.e., CA Dense U-net [12] and FT-JNF [14], experiments show that the proposed network achieves notably better speech enhancement performance.…”

Section: Introductionmentioning

confidence: 99%

“…In our previous works [11,13], a network is proposed to focus on the narrow-band spatial information, namely the difference of spatial correlation between speech and noise formulated in narrow-band. [14] cascades a full-band network with the narrow-band network to exploit the full-band information simultaneously, and achieves the state-of-the-art (SOTA) performance.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

Yang¹,

Quan²,

Li³

2022

Preprint

View full text Add to dashboard Cite

In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrowband spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Narrow-band Spatial Modulementioning

confidence: 99%