To give a more humanized response in Voice Dialogue Applications (VDAs), inferring emotion states from users’ queries may play an important role. However, in VDAs, we have tremendous amount of VDA users and massive scale of unlabeled data with high dimension features from multimodal information, which challenge the traditional speech emotion recognition methods. In this paper, to better infer emotion from conversational voice data, we proposed a semi-supervised multi-path generative neural network. Specifically, first, we build a novel supervised multi-path deep neural network framework. To avoid high dimensional input, raw features are trained by groups in local classifiers. Then high-level features of each local classifiers are concatenated as input of a global classifier. These two kinds classifiers are trained simultaneously through a single objective function to achieve a more effective and discriminative emotion inferring. To further solve the labeled-data-scarcity problem, we extend the multi-path deep neural network to a generative model based on semi-supervised variational autoencoder (semi-VAE), which is able to train the labeled and unlabeled data simultaneously. Experiment based on a 24,000 real-world dataset collected from Sogou Voice Assistant (SVAD13) and a benchmark dataset IEMOCAP show that our method significantly outperforms the existing state-of-the-art results.
The discrepancies between the distributions of the train and test data, a.k.a., domain shift, result in lower generalization for emotion recognition methods. One of the main factors contributing to these discrepancies is human variability. Domain adaptation methods are developed to alleviate the problem of domain shift, however, these techniques while reducing between database variations fail to reduce between-subject variability. In this paper, we propose an adversarial deep domain adaptation approach for emotion recognition from electroencephalogram (EEG) signals. The method jointly learns a new representation that minimizes emotion recognition loss and maximizes subject confusion loss. We demonstrate that the proposed representation can improve emotion recognition performance within and across databases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.