Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging. In this work, we propose a transfer learning method for speech emotion recognition where features extracted from pre-trained wav2vec 2.0 models are modeled using simple neural networks. We propose to combine the output of several layers from the pre-trained model using trainable weights which are learned jointly with the downstream model. Further, we compare performance using two different wav2vec 2.0 models, with and without finetuning for speech recognition. We evaluate our proposed approaches on two standard emotion databases IEMOCAP and RAVDESS, showing superior performance compared to results in the literature.
What are the features that impersonators select to elicit a speaker's identity? We built a voice database of public figures (targets) and imitations produced by professional impersonators. They produced one imitation based on their memory of the target (caricature) and another one after listening to the target audio (replica). A set of naive participants then judged identity and similarity of pairs of voices. Identity was better evoked by the caricatures and replicas were perceived to be closer to the targets in terms of voice similarity. We used this data to map relevant acoustic dimensions for each task. Our results indicate that speaker identity is mainly associated with vocal tract features, while perception of voice similarity is related to vocal folds parameters. We therefore show the way in which acoustic caricatures emphasize identity features at the cost of loosing similarity, which allows drawing an analogy with caricatures in the visual space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.