2022
DOI: 10.1016/j.specom.2022.07.005
|View full text |Cite
|
Sign up to set email alerts
|

GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(7 citation statements)
references
References 57 publications
0
7
0
Order By: Relevance
“…To solve the problem that the Deep Autoencoding Gaussian Mixture Model (DAGMM) is defective in preserving the input topology, Chen et al [26] proposed a Self-Organizing Map-assisted depth Autoencoding Gaussian Mixture Model (SOM-DAGMM), which overcomes the above drawbacks of DAGMM by well balancing the low dimensionality requirement and topology preservation requirement of Gaussian Mixture Model (GMM). In addition, Ye et al [27] explored how to model sequence information in dynamic time scales for learning multi-scale contextual sentiment representations at different scales. This method provides new ideas for network traffic feature representation in industrial Internet.…”
Section: Supervised Learning-based Intrusion Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…To solve the problem that the Deep Autoencoding Gaussian Mixture Model (DAGMM) is defective in preserving the input topology, Chen et al [26] proposed a Self-Organizing Map-assisted depth Autoencoding Gaussian Mixture Model (SOM-DAGMM), which overcomes the above drawbacks of DAGMM by well balancing the low dimensionality requirement and topology preservation requirement of Gaussian Mixture Model (GMM). In addition, Ye et al [27] explored how to model sequence information in dynamic time scales for learning multi-scale contextual sentiment representations at different scales. This method provides new ideas for network traffic feature representation in industrial Internet.…”
Section: Supervised Learning-based Intrusion Detectionmentioning
confidence: 99%
“…For two identical sub-blocks in the j th MRBlock, the Dilated Causal Convolution in each MRBlock starts with a dilation rate of 2 j−1 . The exponentially growing dilation rate allows a rapid increase in the receptive field to capture a more extended range of dependencies [27]. Also, the causal constraints ensure that the model does not leak future information.We use the add operation and the global average pooling operation to fuse forward and backward complementary contextual information about network traffic, as implemented below:…”
Section: Improved Multi-scale Temporal Convolutional Networkmentioning
confidence: 99%
“…According to the Hoehn and Yahr scale, speech data in the MDVR-KCL dataset are classifed into four categories: healthy individuals, PD1 level, PD2 level, and PD3 level, which is completely marked by expert evaluation scores. We compared fve deep learning methods [48][49][50][51], including models based on convolutional neural networks, transformers, and transfer learning for speech emotion recognition, and achieved good results, which is sufcient to prove that these deep models can evaluate the severity of Parkinson's disease speech, and the efectiveness of the proposed TmmNet is also remarkable.…”
Section: Global Evaluationmentioning
confidence: 98%
“…Ye et al [11] evaluate multi-scaled gated TCNs with Log Mel spectrograms and emotion causality loss on Ravdess (70.2% unweighted). For augmentation, Pan and Wu [14] use pitch shifting and noise on Ravdess to improve a 77.8% CNN-LSTM baseline, while Abdelhamid et al [15] tune hyperparameters with fractal search on Ravdess Extended.…”
Section: Literature Reviewmentioning
confidence: 99%