Convolutional RNN: An enhanced model for extracting features from sequential data

Keren, Gil; Schuller, Björn

doi:10.1109/ijcnn.2016.7727636

Cited by 119 publications

(78 citation statements)

References 36 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DL has been shown to significantly boost emotion recognition performance [17,18,19,20,21,22]. Recently, several papers [23,24] presented CNNs in combination with Long Short-Term Memory models (LSTM) to improve speech emotion recognition based on log Mel filter-banks (logMel) or raw signal. [24] demonstrated an end-to-end training from raw signal.…”

Section: Introductionmentioning

confidence: 99%

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

2017

View full text Add to dashboard Cite

Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the recognition performance strongly depends on the type of speech data independent of the choice of input features. Furthermore, we achieved state-of-the-art results on the improvised speech data of IEMOCAP.

show abstract

Section: Introductionmentioning

confidence: 99%

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

2017

View full text Add to dashboard Cite

show abstract

“…In this study, we propose a non-linguistic approach for detecting AD using acoustic features from speech data. Inspired by numerous successes with deep learning for paralinguistic tasks such as for emotion recognition [10,11], we employ convolutional neural networks with a gating mechanism.…”

Section: Introductionmentioning

confidence: 99%

Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data

2018

View full text Add to dashboard Cite

We propose an automatic detection method of Alzheimer's diseases using a gated convolutional neural network (GCNN) from speech data. This GCNN can be trained with a relatively small amount of data and can capture the temporal information in audio paralinguistic features. Since it does not utilize any linguistic features, it can be easily applied to any languages. We evaluated our method using Pitt Corpus. The proposed method achieved the accuracy of 73.6%, which is better than the conventional sequential minimal optimization (SMO) by 7.6 points.

show abstract

“…Although DNNs can be used with spatio-temporal data , they are not always appropriate because they do not naturally accommodate dependence structures that occur in time and space. However, given the modularity of CNNs and RNNs (i.e., they are easily "stacked" to make deeper models) it is no surprise that they can easily be combined in different ways to produce deep hybrid models for spatio-temporal data, such as video image processing and image captioning (e.g., Keren and Schuller, 2016;Tong and Tanaka, 2018). For example, images in a video can be reduced by a CNN to find spatial features and the time evolution of these features can then be modeled with an RNN (usually an LSTM).…”

Section: Deep Neural Dstms (Dn-dstms)mentioning

confidence: 99%

Comparison of Deep Neural Networks and Deep Hierarchical Models for Spatio-Temporal Data

Wikle

2019

JABES

View full text Add to dashboard Cite

Spatio-temporal data are ubiquitous in the agricultural, ecological, and environmental sciences, and their study is important for understanding and predicting a wide variety of processes. One of the difficulties with modeling spatial processes that change in time is the complexity of the dependence structures that must describe how such a process varies, and the presence of high-dimensional complex datasets and large prediction domains. It is particularly challenging to specify parameterizations for nonlinear dynamic spatio-temporal models (DSTMs) that are simultaneously useful scientifically and efficient computationally. Statisticians have developed deep hierarchical models that can accommodate process complexity as well as the uncertainties in the predictions and inference. However, these models can be expensive and are typically application specific. On the other hand, the machine learning community has developed alternative "deep learning" approaches for nonlinear spatio-temporal modeling. These models are flexible yet are typically not implemented in a probabilistic framework. The two paradigms have many things in common and suggest hybrid approaches that can benefit from elements of each framework. This overview paper presents a brief introduction to the deep hierarchical DSTM (DH-DSTM) framework, and deep models in machine learning, culminating with the deep neural DSTM (DN-DSTM). Recent approaches that combine elements from DH-DSTMs and echo state network DN-DSTMs are presented as illustrations.

show abstract

Convolutional RNN: An enhanced model for extracting features from sequential data

Cited by 119 publications

References 36 publications

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data

Comparison of Deep Neural Networks and Deep Hierarchical Models for Spatio-Temporal Data

Contact Info

Product

Resources

About