Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.96
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec

Abstract: Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxóchitl Mixtec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…In particular, for small corpora, the DNN architecture has been demonstrated to yield better results than statistical alternatives (e.g., subspace Gaussian mixture models) and other neural architectures (e.g., time delay neural networks) (Morris, 2021). We also found the DNN to be substantially more accurate than the endangered language end-to-end recipe (Shi et al, 2021) in ESPnet (Watanabe et al, 2018). 1 1 Our experiments using ESPnet and wav2vec 2.0 to finetune from multilingual models yielded inconsistent and weak Crucially, however, we note that our goal is not to improve upon current state-of-the-art for lowresource ASR but rather to examine what data partitioning strategies and evaluation methods lead to reliable estimates in low-resource settings with an already strong model architecture.…”
Section: Language and Acoustic Modelsmentioning
confidence: 77%
“…In particular, for small corpora, the DNN architecture has been demonstrated to yield better results than statistical alternatives (e.g., subspace Gaussian mixture models) and other neural architectures (e.g., time delay neural networks) (Morris, 2021). We also found the DNN to be substantially more accurate than the endangered language end-to-end recipe (Shi et al, 2021) in ESPnet (Watanabe et al, 2018). 1 1 Our experiments using ESPnet and wav2vec 2.0 to finetune from multilingual models yielded inconsistent and weak Crucially, however, we note that our goal is not to improve upon current state-of-the-art for lowresource ASR but rather to examine what data partitioning strategies and evaluation methods lead to reliable estimates in low-resource settings with an already strong model architecture.…”
Section: Language and Acoustic Modelsmentioning
confidence: 77%
“…In experiment SSL, the FBANK feature extractor (used in Base) is replaced by the pretained Hu-BERT model, which is fine-tuned during training. 7 Learnable combinations : Experiments Linear, Conv. and co-Att.…”
Section: Methodsmentioning
confidence: 99%
“…End-to-end models based on deep learning have demonstrated their superiority over conventional hidden Markov-based models on speech tasks for some corpora [1][2][3][4]. End-to-end models could be beneficial to low resource speech tasks because these models: (1) alleviate the need of language specific resources such as lexicons [5][6][7]. (2) can be trained multilingually to facilitate cross-lingual transfers between high resource and low resource languages through shared architecture and weights [8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent improvement of speech recognition for low-resource languages has also been seen as a way to mitigate the transcription bottleneck automatically transcribing large amount of untranscribed speech data (e.g. Foley et al, 2018;Shi et al, 2021;Adams et al, 2021).…”
Section: Fieldwork Technologiesmentioning
confidence: 99%