ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054548
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Pretraining Transfers Well Across Languages

Abstract: Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting. This assumes the existence of a parallel corpus of speech and orthographic transcriptions. Recently, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabelled data. In this work, we investigate whether unsupervised pretraining transfers well across languages. We show that a slight modification of the CPC pretraining extracts … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
154
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 159 publications
(159 citation statements)
references
References 28 publications
4
154
0
1
Order By: Relevance
“…It is possible to use a variational approach to estimate a bound on the mutual information between continuous, high-dimensional quantities (Donsker & Varadhan, 1983; Nguyen et al, 2010; Alemi et al, 2016; Belghazi et al, 2018; Oord et al, 2018; Poole et al, 2019). Recent works capture this intuition to yield self-supervised embeddings in the modalities of imaging (Oord et al, 2018; Hjelm et al, 2018; Bachman et al, 2019; Tian et al, 2019; Hénaff et al, 2019; Löwe et al, 2019; He et al, 2019; Chen et al, 2020; Tian et al, 2020; Wang & Isola, 2020), text (Rivière et al, 2020; Oord et al, 2018; Kong et al, 2019), and audio (Löwe et al, 2019; Oord et al, 2018), with high empirical downstream performance.…”
Section: Background and Related Workmentioning
confidence: 99%
“…It is possible to use a variational approach to estimate a bound on the mutual information between continuous, high-dimensional quantities (Donsker & Varadhan, 1983; Nguyen et al, 2010; Alemi et al, 2016; Belghazi et al, 2018; Oord et al, 2018; Poole et al, 2019). Recent works capture this intuition to yield self-supervised embeddings in the modalities of imaging (Oord et al, 2018; Hjelm et al, 2018; Bachman et al, 2019; Tian et al, 2019; Hénaff et al, 2019; Löwe et al, 2019; He et al, 2019; Chen et al, 2020; Tian et al, 2020; Wang & Isola, 2020), text (Rivière et al, 2020; Oord et al, 2018; Kong et al, 2019), and audio (Löwe et al, 2019; Oord et al, 2018), with high empirical downstream performance.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Recent works have demonstrated an interest in unsupervised representation learning as a pretraining method to obtain good speech features for downstream tasks with little labelled data [1,2,3,4,5]. While Contrastive Predictive Coding (CPC) and derivatives appear to be versatile methods for unsupervised representation learning [6,7,8], they do not yet reach the state-ofthe-art (SOTA) results on purely unsupervised learning metrics [2,6,9,10].…”
Section: Introductionmentioning
confidence: 99%
“…This confirms our hypothesis that wav2vec representations remove speaker information from speech signal. 10…”
Section: Better Source-target Alignmentsmentioning
confidence: 99%
“…[5] shows that such representations are useful to improve several speech tasks while [4] extends those works by looking at the representations' robustness to domain and language shifts. In the same vein, [10] compares self-supervised and supervised pre-training for ASR and shows that CPC pre-training extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. Another promising way is to use speech enhancement as a task for feature representation learning [11,12].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation