Lightweight Adapter Tuning for Multilingual Speech Translation

Le, Hang; Pino, Juan; Wang, Changhan; Gu, Jiatao; Schwab, Didier; Besacier, Laurent

doi:10.48550/arxiv.2106.01463

Cited by 5 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, Dong et al [1277] proposed a listen-understand-translate model, in which the proposed framework utilizes a pre-trained BERT model to enforce the upper encoder to produce as much semantic information as possible, without extra data. Le et al [1278] has presented a study of adapters for multilingual ST and shown that language-specific adapters can enable a fully trained multilingual ST model to be further specialized in each language pair.…”

Section: Pre-training With Unlabeled Speech/text Datamentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

Section: Pre-training With Unlabeled Speech/text Datamentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The development of multilingual models for either machine translation, speech translation or speech recognition often concern between versatility versus specialization [22]. The motivation comes from the disbelief that there are features being shared between languages and at the same time each language requires to selectively represented, and networks are encouraged to change "modes" depending on the language being processed [23].…”

Section: Language Adaptive Componentsmentioning

confidence: 99%

“…The motivation comes from the disbelief that there are features being shared between languages and at the same time each language requires to selectively represented, and networks are encouraged to change "modes" depending on the language being processed [23]. Since then, multingual model designers opt to use specific network components being presented for each language, ranging from weight generator [24] to adapters [15,22] or recently adaptive weights adding scales and biases to each weight matrix in the whole architecture [16]. In this paper, the last two options are selected for investigation thanks to being computationally manageable.…”

Section: Language Adaptive Componentsmentioning

confidence: 99%

“…On the one hand, Adapters plugged into pretrained models were introduced in computer vision [25] and later natural language processing [26] and recently in Transformers for text/speech translation [15,22]. They are materialized with a small multilayer perceptrons (MLP) with one hidden layer that acts as a downsampler (for parameter efficiency).…”

Section: Language Adaptive Componentsmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive multilingual speech recognition with pretrained models

Pham¹,

Waibel²,

Niehues³

2022

Preprint

View full text Add to dashboard Cite

Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretrained models for two modalities: wav2vec 2.0 for audio and MBART50 for text, together with the adaptive weight techniques to massively improve the recognition quality on the public datasets containing CommonVoice and Europarl. Overall, we noticed an 44% improvement over purely supervised learning, and more importantly, each technique provides a different reinforcement in different languages. We also explore other possibilities to potentially obtain the best model by slightly adding either depth or relative attention to the architecture.

show abstract

“…We propose a music genre-conditioned training strategy to adapt an endto-end lyrics transcription system according to the music genre. Inspired by the success of adaptive fine-tuning with pre-trained models in natural language processing [19] and speech translation [20][21][22], we propose to incorporate genre-specific adapters to a pre-trained transformer-based polyphonic lyrics transcription model [23].…”

Section: Introductionmentioning

confidence: 99%

Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

Gao¹,

Gupta²,

Li³

2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by the background music, but also because the background music and the singing style vary across music genres, such as pop, metal, and hip hop, which affects lyrics intelligibility of the song in different ways. In this work, we propose to transcribe the lyrics of polyphonic music using a novel genreconditioned network. The proposed network adopts pre-trained model parameters, and incorporates the genre adapters between layers to capture different genre peculiarities for lyrics-genre pairs, thereby only requiring lightweight genre-specific parameters for training. Our experiments show that the proposed genre-conditioned network outperforms the existing lyrics transcription systems.

show abstract

Lightweight Adapter Tuning for Multilingual Speech Translation

Cited by 5 publications

References 21 publications

A Roadmap for Big Model

A Roadmap for Big Model

Adaptive multilingual speech recognition with pretrained models

Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

Contact Info

Product

Resources

About