Unsupervised Pretraining Transfers Well Across Languages

Rivière, Morgane; Joulin, Armand; Mazaré, Pierre-Emmanuel; Dupoux, Emmanuel

doi:10.1109/icassp40776.2020.9054548

Cited by 159 publications

(159 citation statements)

References 28 publications

Supporting

Mentioning

154

Contrasting

Unclassified

Order By: Relevance

“…It is possible to use a variational approach to estimate a bound on the mutual information between continuous, high-dimensional quantities (Donsker & Varadhan, 1983; Nguyen et al, 2010; Alemi et al, 2016; Belghazi et al, 2018; Oord et al, 2018; Poole et al, 2019). Recent works capture this intuition to yield self-supervised embeddings in the modalities of imaging (Oord et al, 2018; Hjelm et al, 2018; Bachman et al, 2019; Tian et al, 2019; Hénaff et al, 2019; Löwe et al, 2019; He et al, 2019; Chen et al, 2020; Tian et al, 2020; Wang & Isola, 2020), text (Rivière et al, 2020; Oord et al, 2018; Kong et al, 2019), and audio (Löwe et al, 2019; Oord et al, 2018), with high empirical downstream performance.…”

Section: Background and Related Workmentioning

confidence: 99%

Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization

Zhang

Ghassemi

et al. 2020

Preprint

View full text Add to dashboard Cite

Pretrained embedding representations of biological sequences which capture meaningful properties can alleviate many problems associated with supervised learning in biology. We apply the principle of mutual information maximization between local and global information as a self-supervised pretraining signal for protein embeddings. To do so, we divide protein sequences into fixed size fragments, and train an autoregressive model to distinguish between subsequent fragments from the same protein and fragments from random proteins. Our model, CPCProt, achieves comparable performance to state-of-the-art self-supervised models for protein sequence embeddings on various downstream tasks, but reduces the number of parameters down to 0.9% to 8.9% of benchmarked models. Further, we explore how downstream assessment protocols affect embedding evaluation, and the effect of contrastive learning hyperparameters on empirical performance. We hope that these results will inform the development of contrastive learning methods in protein biology and other modalities.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization

Zhang

Ghassemi

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent works have demonstrated an interest in unsupervised representation learning as a pretraining method to obtain good speech features for downstream tasks with little labelled data [1,2,3,4,5]. While Contrastive Predictive Coding (CPC) and derivatives appear to be versatile methods for unsupervised representation learning [6,7,8], they do not yet reach the state-ofthe-art (SOTA) results on purely unsupervised learning metrics [2,6,9,10].…”

Section: Introductionmentioning

confidence: 99%

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Kharitonov

Rivière

Synnaeve

et al. 2021

2021 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

View full text Add to dashboard Cite

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.

show abstract

“…This confirms our hypothesis that wav2vec representations remove speaker information from speech signal. 10…”

Section: Better Source-target Alignmentsmentioning

confidence: 99%

“…[5] shows that such representations are useful to improve several speech tasks while [4] extends those works by looking at the representations' robustness to domain and language shifts. In the same vein, [10] compares self-supervised and supervised pre-training for ASR and shows that CPC pre-training extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. Another promising way is to use speech enhancement as a task for feature representation learning [11,12].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Investigating Self-Supervised Pre-Training for End-to-End Speech Translation

Nguyen¹,

Bougares²,

Tomashenko³

et al. 2020

Interspeech 2020

View full text Add to dashboard Cite

Self-supervised learning from raw speech has been proven beneficial to improve automatic speech recognition (ASR). We investigate here its impact on end-to-end automatic speech translation (AST) performance. We use a contrastive predictive coding (CPC) model pre-trained from unlabeled speech as a feature extractor for a downstream AST task. We show that selfsupervised pre-training is particularly efficient in low resource settings and that fine-tuning CPC models on the AST training data further improves performance. Even in higher resource settings, ensembling AST models trained with filter-bank and CPC representations leads to near state-of-the-art models without using any ASR pre-training. This might be particularly beneficial when one needs to develop a system that translates from speech in a language with poorly standardized orthography or even from speech in an unwritten language.

show abstract

Unsupervised Pretraining Transfers Well Across Languages

Cited by 159 publications

References 28 publications

Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization

Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Investigating Self-Supervised Pre-Training for End-to-End Speech Translation

Contact Info

Product

Resources

About