Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
DOI: 10.18653/v1/2021.americasnlp-1.28
|View full text |Cite
|
Sign up to set email alerts
|

Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task

Abstract: We evaluated a range of neural machine translation techniques developed specifically for low-resource scenarios. Unsuccessfully. In the end, we submitted two runs: (i) a standard phrase-based model, and (ii) a random babbling baseline using character trigrams. We found that it was surprisingly hard to beat (i), in spite of this model being, in theory, a bad fit for polysynthetic languages; and more interestingly, that (ii) was better than several of the submitted systems, highlighting how difficult low-resourc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
(7 reference statements)
0
1
0
Order By: Relevance
“…most relevant system properties, are shown in Table 5: we omit the "random babbling" baseline by Bollmann et al (2021) as well as all systems trained on parts or all of the development set from this analysis. Due to the nature of the shared task we summarize here, systems are trained on different datasets and not directly comparable; we focus on general trends and leave a principled comparison of the effects and interactions of model architectures, training techniques, and datasets to future work.…”
Section: Machine Translation Resultsmentioning
confidence: 99%
“…most relevant system properties, are shown in Table 5: we omit the "random babbling" baseline by Bollmann et al (2021) as well as all systems trained on parts or all of the development set from this analysis. Due to the nature of the shared task we summarize here, systems are trained on different datasets and not directly comparable; we focus on general trends and leave a principled comparison of the effects and interactions of model architectures, training techniques, and datasets to future work.…”
Section: Machine Translation Resultsmentioning
confidence: 99%