2016 International Joint Conference on Neural Networks (IJCNN) 2016
DOI: 10.1109/ijcnn.2016.7727636
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional RNN: An enhanced model for extracting features from sequential data

Abstract: Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
78
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 119 publications
(78 citation statements)
references
References 36 publications
(52 reference statements)
0
78
0
Order By: Relevance
“…DL has been shown to significantly boost emotion recognition performance [17,18,19,20,21,22]. Recently, several papers [23,24] presented CNNs in combination with Long Short-Term Memory models (LSTM) to improve speech emotion recognition based on log Mel filter-banks (logMel) or raw signal. [24] demonstrated an end-to-end training from raw signal.…”
Section: Introductionmentioning
confidence: 99%
“…DL has been shown to significantly boost emotion recognition performance [17,18,19,20,21,22]. Recently, several papers [23,24] presented CNNs in combination with Long Short-Term Memory models (LSTM) to improve speech emotion recognition based on log Mel filter-banks (logMel) or raw signal. [24] demonstrated an end-to-end training from raw signal.…”
Section: Introductionmentioning
confidence: 99%
“…In this study, we propose a non-linguistic approach for detecting AD using acoustic features from speech data. Inspired by numerous successes with deep learning for paralinguistic tasks such as for emotion recognition [10,11], we employ convolutional neural networks with a gating mechanism.…”
Section: Introductionmentioning
confidence: 99%
“…Although DNNs can be used with spatio-temporal data , they are not always appropriate because they do not naturally accommodate dependence structures that occur in time and space. However, given the modularity of CNNs and RNNs (i.e., they are easily "stacked" to make deeper models) it is no surprise that they can easily be combined in different ways to produce deep hybrid models for spatio-temporal data, such as video image processing and image captioning (e.g., Keren and Schuller, 2016;Tong and Tanaka, 2018). For example, images in a video can be reduced by a CNN to find spatial features and the time evolution of these features can then be modeled with an RNN (usually an LSTM).…”
Section: Deep Neural Dstms (Dn-dstms)mentioning
confidence: 99%