ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414542
|View full text |Cite
|
Sign up to set email alerts
|

LSSED: A Large-Scale Dataset and Benchmark for Speech Emotion Recognition

Abstract: Speech emotion recognition is a vital contributor to the next generation of human-computer interaction (HCI). However, current existing small-scale databases have limited the development of related research. In this paper, we present LSSED, a challenging large-scale english speech emotion dataset, which has data collected from 820 subjects to simulate realworld distribution. In addition, we release some pre-trained models based on LSSED, which can not only promote the development of speech emotion recognition,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(8 citation statements)
references
References 24 publications
0
8
0
Order By: Relevance
“…This study has focused on the VGG-16 (with batch normalization) pre-trained computer vision model and we have highlighted efficient components for fine-tuning VGG-16 for emotional speech The work presented in this paper could be extended to include more pre-trained computer vision deep models such as ResNet (He et al, 2016), EfficientNet (Tan et Le, 2019), ViT (Dosovitskiy et al, 2020) and Inceptionv3 (GoogLeNet) (Szegedy et al, 2016). Besides, extensive experiments can be performed on other emotional datasets like LSSED (Fan et al, 2021), IEMOCAP (Busso et al, 2008), and RAVDESS (Livingstone et Russo, 2018) . Moreover, it could be interesting to include other modalities for emotional recognition like text, images and videos.…”
Section: Discussionmentioning
confidence: 99%
“…This study has focused on the VGG-16 (with batch normalization) pre-trained computer vision model and we have highlighted efficient components for fine-tuning VGG-16 for emotional speech The work presented in this paper could be extended to include more pre-trained computer vision deep models such as ResNet (He et al, 2016), EfficientNet (Tan et Le, 2019), ViT (Dosovitskiy et al, 2020) and Inceptionv3 (GoogLeNet) (Szegedy et al, 2016). Besides, extensive experiments can be performed on other emotional datasets like LSSED (Fan et al, 2021), IEMOCAP (Busso et al, 2008), and RAVDESS (Livingstone et Russo, 2018) . Moreover, it could be interesting to include other modalities for emotional recognition like text, images and videos.…”
Section: Discussionmentioning
confidence: 99%
“…They used EMODB [61], a publicly available dataset that offers about 500 utterances with anger, boredom, disgust, fear, happiness, sadness, and neutral attitude. Fan et al proposed PyResNet to classify emotions from speech [51]. The name comes from modifying the second layer of the ResNet with pyramid convolution, which should reduce the issue of uncertain time position of accurate speech information.…”
Section: Machine Learning Models and Techniquesmentioning
confidence: 99%
“…However, angry, happy, neutral, and sad are selected in this paper. There are also many real-world corpora like LSSED [52], and so on.…”
Section: Data Preparationmentioning
confidence: 99%