“…In this paper, we focus on the first track of the challenge, ExVo Multi-Task learning. Taking inspirations from recent works on the use of embeddings from the pre-trained networks for various speech procession tasks including paralinguistic tasks (Keesing et al, 2021;Yang et al, 2021;Mostaani et al, 2022;Srinivasan et al, 2022), we investigate the utility of neural embeddings for speakers' emotion intensity, native country and age estimation. In that regard, as illustrated in Figure 1, we compare two types of neural embedding extraction approaches: (a) neural embeddings extracted from neural networks trained in self-supervised learning (SSL) setting and (b) neural embeddings extracted from neural networks trained on auxiliary out-of-domain tasks such as, SER, phone classification and on in-domain ExVo challenge task.…”