2022
DOI: 10.1109/access.2022.3219094
|View full text |Cite
|
Sign up to set email alerts
|

Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features

Abstract: Speech emotion recognition (SER) is essential for understanding a speaker's intention. Recently, some groups have attempted to improve SER performance using a bidirectional long short-term memory (BLSTM) to extract features from speech sequences and a self-attention mechanism to focus on the important parts of the speech sequences. SER also benefits from combining the information in speech with text, which can be accomplished automatically using an automatic speech recognizer (ASR), further improving its perfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(2 citation statements)
references
References 42 publications
0
0
0
Order By: Relevance
“…This technique outperforms previous state-of-the-art algorithms with a weighted average accuracy of 76.6%, as demonstrated by experiments on the IEMOCAP dataset. Its performance is explored in detail across feature extractors [4]. In order to improve Speech Emotion Recognition (SER), this work combines self-attention and bidirectional long short-term memory (BLSTM) techniques.…”
Section: Related Workmentioning
confidence: 99%
“…This technique outperforms previous state-of-the-art algorithms with a weighted average accuracy of 76.6%, as demonstrated by experiments on the IEMOCAP dataset. Its performance is explored in detail across feature extractors [4]. In order to improve Speech Emotion Recognition (SER), this work combines self-attention and bidirectional long short-term memory (BLSTM) techniques.…”
Section: Related Workmentioning
confidence: 99%
“…Confidence scores can help identify words of high recognition quality for better use of ASR transcripts. [9] and [19] adjusted the attention weights for words using their confidence scores. [20] proposed removing words with low confidence and selecting words with the highest confidence from multiple ASR hypotheses [21].…”
Section: Emotion and Asr Confidencementioning
confidence: 99%