ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747095
|View full text |Cite
|
Sign up to set email alerts
|

Speech Emotion Recognition with Co-Attention Based Multi-Level Acoustic Information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 83 publications
(28 citation statements)
references
References 20 publications
0
28
0
Order By: Relevance
“…CTC+Attention [10] 67.0 69.0 Head Fusion [9] 76.2 76.4 HGFM [23] 66.6 70.5 DAAE+CNN+Attention [24] 70.1 70.7 HNSD [25] 70.5 72.5 CNN-ELM+STC attention [12] 61.3 60. 4 Multi-level Co-att [11] 71.6 72.7 DKDFMH 79.1 77.1…”
Section: Methods Wa Uamentioning
confidence: 99%
See 1 more Smart Citation
“…CTC+Attention [10] 67.0 69.0 Head Fusion [9] 76.2 76.4 HGFM [23] 66.6 70.5 DAAE+CNN+Attention [24] 70.1 70.7 HNSD [25] 70.5 72.5 CNN-ELM+STC attention [12] 61.3 60. 4 Multi-level Co-att [11] 71.6 72.7 DKDFMH 79.1 77.1…”
Section: Methods Wa Uamentioning
confidence: 99%
“…Head Fusion was proposed in [9] by fusing multi-attention heads in the same attention map. In the field of SER, [10,11,12] have shown that the attention mechanism performs well on several datasets, highlighting its effectiveness for sentiment classification.…”
Section: Related Workmentioning
confidence: 99%
“…WA UA Supervised Methods CNN-ELM+STC attention [29] 61.32 60.43 Audio 25 [30] 60.64 61.32 IS09-classification [31] 68.10 63.80 Co-attention-based fusion [32] 69.80 71.05 Self-supervised Methods Wav2Vec [33] 59.79 -Data2Vec Large [34] 66.31 -WavLM Large [35] 70.62 -HuBERT Large 70.24 71.13 Data Augmentation Methods GAN [10] -53.60 CycleGAN [12] -60.37 VTLP [36] 66.90 65.30 MWA-SER [37] -66.00 HuBERT Large + CopyPaste [28] 70.79 71.35 HuBERT Large + Speed Perturbation [9] 70. 35…”
Section: Methodsmentioning
confidence: 99%
“…Based on Transformer, various selfsupervised speech representation learning approaches have also been proposed, including wav2vec [51], wav2vec 2.0 [52] and HuBERT [53]. Built on the pretrained self-supervised models, several researches have delivered promising results in the literature [33], [34], [39], [49], [54]- [57]. Typically, Monica et al [33] fine-tuned the pretrained HuBERT model for AD detection and achieved competitive performance.…”
Section: B Transformer In Paralinguistic Speech Processingmentioning
confidence: 99%