Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas 2021
DOI: 10.18653/v1/2021.americasnlp-1.29
|View full text |Cite
|
Sign up to set email alerts
|

The Helsinki submission to the AmericasNLP shared task

Abstract: The University of Helsinki participated in the AmericasNLP shared task for all ten language pairs. Our multilingual NMT models reached the first rank on all language pairs in track 1, and first rank on nine out of ten language pairs in track 2. We focused our efforts on three aspects: (1) the collection of additional data from various sources such as Bibles and political constitutions, (2) the cleaning and filtering of training data with the OpusFilter toolkit, and (3) different multilingual training technique… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(17 citation statements)
references
References 13 publications
0
17
0
Order By: Relevance
“…BLEU and chrF are the two metrics adopted by AmericasNLP 2021 Shared Task. We surpassed the winner of Americas-NLP2021 (Vázquez et al, 2021), in either or both metrics, for 5 language pairs with the following languages as target: Bribri (bzd), Asháninka (cni), Wixarika (hch), Nahuatl (nah), and Hñähñu (oto). Notably, we double the performance in BLEU score for es-bzd, increasing by about 7.7 BLEU scores and 0.1 chrF.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…BLEU and chrF are the two metrics adopted by AmericasNLP 2021 Shared Task. We surpassed the winner of Americas-NLP2021 (Vázquez et al, 2021), in either or both metrics, for 5 language pairs with the following languages as target: Bribri (bzd), Asháninka (cni), Wixarika (hch), Nahuatl (nah), and Hñähñu (oto). Notably, we double the performance in BLEU score for es-bzd, increasing by about 7.7 BLEU scores and 0.1 chrF.…”
Section: Resultsmentioning
confidence: 99%
“…We are able to surpass previous SOTA in five language pairs and mBART50 curr achieves 7.143 BLEU and 0.3134 chrF on average, comparing to previous SOTA having 6.693 BLEU and 0.3171 chrF on average. It can be hypothesized that the reason why we are able to improve average BLEU score by 0.45, accomplish comparable average chrF, and surpass in five language pairs is because we use an MT model pretrained on 50 languages, while Vázquez et al (2021) pretrain their model only on es-en. We suspect that there could be some languages, other than Spanish and English, which contribute to positive transfer to Indigenous languages.…”
Section: Comparisons To Sotamentioning
confidence: 94%
See 3 more Smart Citations