Multi-Attention Bottleneck for Gated Convolutional Encoder-Decoder-Based Speech Enhancement

Saleem, Nasir; Gunawan, Teddy Surya; Shafi, Muhammad; Bourouis, Sami; Trigui, Aymen

doi:10.1109/access.2023.3324210

Cited by 6 publications

(2 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The model performance evaluated in terms of perceptual evaluation of speech quality (PESQ) [35] and short-term objective intelligibility (STOI) [36]. The performance of the TFANUNet model is compared against following baselines: CRN [11] , TCNN [37], DCCRN [38], CS-CRN [39], DeepxiMMSE [40], MASENet [25], SADNUNet [17], and MAB-CED [26].…”

Section: Experimental Results Analysismentioning

confidence: 99%

See 1 more Smart Citation

Single channel speech enhancement using time-frequency attention mechanism based nested U-net model

Prathipati,

Chakravarthy

2024

Eng. Res. Express

View full text Add to dashboard Cite

Deep-learning models have used attention mechanisms to improve the quality and intelligibility of noisy speech, demonstrating the effectiveness of attention mechanisms. We rely on either spatial or temporal-based attention mechanisms, resulting in severe information loss. In this paper, a time-frequency attention mechanism with a nested U-network (TFANUNet) is proposed for single-channel speech enhancement. By using time-frequency attention (TFA), learns the channel, frequency and time information which is more significant for speech enhancement. Basically, the proposed model is an encoder-decoder model, where each layer in the encoder and decoder is followed by a nested dense residual dilated DensNet (NDRD) based multi-scale context aggression block. NDRD involves multiple dilated convolution with different dilatation factors to explore the large receptive area at different scales simultaneously. NDRD avoids the aliasing problem in DenseNet. We integrated the TFA and NDRD blocks into the proposed model to enable refined feature set extraction without information loss and utterance-level context aggregation, respectively. The proposed TFANUNet model results outperform baselines in terms of STOI and PESQ.

show abstract

Section: Experimental Results Analysismentioning

confidence: 99%

“…In the MAB-CED [26] and U-Transformer + FAT models [27], the TFA method is employed. This method consists of two main components: time-dimension attention and frequency-dimension attention, which work together to generate 1-dimensional attention maps.…”

Section: Introductionmentioning

confidence: 99%