Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research W 2021
DOI: 10.18653/v1/2021.eacl-srw.22
|View full text |Cite
|
Sign up to set email alerts
|

The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

Abstract: This paper evaluates the performance of several modern subword segmentation methods in a low-resource neural machine translation setting. We compare segmentations produced by applying BPE at the token or sentence level with morphologically-based segmentations from LMVR and MORSEL. We evaluate translation tasks between English and each of Nepali, Sinhala, and Kazakh, and predict that using morphologically-based segmentation methods would lead to better performance in this setting. However, comparing to BPE, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Unsupervised morphological segmentation has not shown consistent improvements (Zhou, 2018;Saleva and Lignos, 2021;Domingo et al, 2023) Koehn and Knight, 2003;Riedl and Biemann, 2016;Tuggener, 2016) since most prior datasets do not contain any (Henrich and Hinrichs, 2011;van Zaanen et al, 2014). It is not trivial to obtain negative examples from Wiktionary since a large amount of compound words are not categorized as such, leading to many false negatives.…”
Section: Related Workmentioning
confidence: 99%
“…Unsupervised morphological segmentation has not shown consistent improvements (Zhou, 2018;Saleva and Lignos, 2021;Domingo et al, 2023) Koehn and Knight, 2003;Riedl and Biemann, 2016;Tuggener, 2016) since most prior datasets do not contain any (Henrich and Hinrichs, 2011;van Zaanen et al, 2014). It is not trivial to obtain negative examples from Wiktionary since a large amount of compound words are not categorized as such, leading to many false negatives.…”
Section: Related Workmentioning
confidence: 99%
“…Several researchers have investigated the effect of applying different morphological and agnostic segmentation approaches on the MT performance for monolingual languages. Roest et al (2020); Saleva and Lignos (2021) show that unsupervised morphology-based segmentation like Linguistically Motivated Vocabulary Reduction (LMVR) (Ataman et al, 2017), Morfessor , and FlatCat for Nepali-English, Sinhala-English, Kazakh-English, and Inuktitut-English language pairs show either no improvement or no significant improvement over the agnostic BPE segmentation (Sennrich et al, 2016) in translation tasks. Meanwhile, Mager et al (2022) and Ataman et al (2017) show that for polysynthetic and highly agglutinative languages, unsupervised morphology-based segmentation outperforms BPEs (Sennrich et al, 2016) in MT tasks in both directions.…”
Section: Related Workmentioning
confidence: 99%
“…Using unsupervisedly obtained "morphological" subwords on the other hand, only Ataman and Federico (2018b) find that a model based on Morfessor FlatCat can outperform BPE; Zhou (2018), , Macháček et al (2018), and Saleva and Lignos (2021) find no reliable improvement over BPE for translation. Banerjee and Bhattacharyya (2018) analyze translations obtained segmenting with Morfessor and BPE, and conclude that a possible improvement depends on the similarity of the languages.…”
Section: Comparing Morphological Segmentation To Bpe and Friendsmentioning
confidence: 99%