2023
DOI: 10.1016/j.ipm.2022.103148
|View full text |Cite
|
Sign up to set email alerts
|

Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…It may be that CPT only yields appreciable performance gains once a sufficient amount of unlabeled audio can be obtained (e.g. 200 hours of Ainu: Nowakowski et al, 2023). However, obtaining such a large amount of data for minority languages or language variants such as Gronings, Besemah, and Nasal is unlikely.…”
Section: Discussionmentioning
confidence: 99%
“…It may be that CPT only yields appreciable performance gains once a sufficient amount of unlabeled audio can be obtained (e.g. 200 hours of Ainu: Nowakowski et al, 2023). However, obtaining such a large amount of data for minority languages or language variants such as Gronings, Besemah, and Nasal is unlikely.…”
Section: Discussionmentioning
confidence: 99%
“…Subsequent advancements by Nowakowski [Nowakowski et al, 2023] involved refining a multilingual speech representation model for lesser-resourced languages via multilingual fine-tuning and ongoing pretraining, illustrating the adaptability of NLP techniques to languages with limited resources like Ainu.…”
Section: Nlp For Ainumentioning
confidence: 99%
“…Cross-dataset evaluation for popular English speech corpora indicates that CPT helps to reduce the error rate in the target domain. In [43] and [11] CPT is utilized for cross-lingual adaptation of wav2vec2 for Korean and Ainu respectively. Notably for Ainu, which is an endagered language, CPT has resulted in significant system In the right, we see the proposed domain-adaptive finetuning stage, where the speech recognition task is learned using transcribed source domain data, while adaptation to the target domain is performed by including the self-supervised loss over (audio-only) source and target domain data improvement.…”
Section: Leveraging In-domain Self-supervisionmentioning
confidence: 99%
“…Furthermore, UDA has been used for speaker adaptation, and to improve performance under speaker, gender and accent variability [7], [8]. UDA has also been employed for multilingual and cross-lingual ASR, in order to improve ASR models for low-resource languages [9], adapt to different dialects [10], and even train speech recognition systems for endangered languages [11].…”
Section: Introductionmentioning
confidence: 99%