Proceedings of the 3rd Workshop on Neural Generation and Translation 2019
DOI: 10.18653/v1/d19-5619
|View full text |Cite
|
Sign up to set email alerts
|

On the Importance of Word Boundaries in Character-level Neural Machine Translation

Abstract: Neural Machine Translation (NMT) models generally perform translation using a fixedsize lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not optimized for the translation task. Recent studies have shown that the same approach can be extended to perform NMT dir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 17 publications
(25 reference statements)
1
11
0
Order By: Relevance
“…On the other hand, the characterlevel models perform worse on compounds, which are a local phenomenon. Ataman et al (2019) observed similar results on compounds in their hybrid character-word-level method. We suspect this might be caused by poor memorization of some compounds in the character models.…”
Section: Resultssupporting
confidence: 52%
See 1 more Smart Citation
“…On the other hand, the characterlevel models perform worse on compounds, which are a local phenomenon. Ataman et al (2019) observed similar results on compounds in their hybrid character-word-level method. We suspect this might be caused by poor memorization of some compounds in the character models.…”
Section: Resultssupporting
confidence: 52%
“…Cherry et al (2018) showed that with a sufficiently deep recurrent model, no changes in the model are necessary, and they can still reach translation quality that is on par with subword models. Luong and Manning (2016) and Ataman et al (2019) can leverage characterlevel information but they require tokenized text as an input and only have access to the character-level embeddings of predefined tokens.…”
Section: Related Workmentioning
confidence: 99%
“…More specifically, the deletion of negation cues causes more errors. Ataman et al (2019) show that character-level models perform better than subword-level models on negation. Instead, we evaluate NMT models with different neural networks to learn their abilities to translate negation, by scoring contrastive translate pairs.…”
Section: Introductionmentioning
confidence: 92%
“…The second reason for there being a better resilience is that the GRU network is better at modelling shorter groups of words [ 51 ]. It does not suffer from the problem posed in the mapping of viseme clusters to words using the GPT-based iterator whereby compound errors occur in the combination of words during the iterations and in which the sentence being decoded is based on the conditional dependence of word combinations.…”
Section: Experiments and Resultsmentioning
confidence: 99%