Character-level Transformer-based Neural Machine Translation

Banar, Nikolay; Daelemans, Walter; Kestemont, Mike

doi:10.48550/arxiv.2005.11239

Cited by 1 publication

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, translating at character level may incur significant computational overhead. Therefore, later works on character-level NMT (Cherry et al, 2018;Banar et al, 2020) mainly focus on reducing computation cost of them. Cherry et al (2018) show that by employing source sequence compression techniques, the quality and efficiency of character-based models can be properly balanced.…”

Section: Related Workmentioning

confidence: 99%

“…Cherry et al (2018) show that by employing source sequence compression techniques, the quality and efficiency of character-based models can be properly balanced. Banar et al (2020) share the same idea as Cherry et al (2018) but build their models using Transformer architecture. Our work differs from theirs in that we aim to analyze the performance of existing models instead of exploring novel architectures.…”

Section: Related Workmentioning

confidence: 99%

“…An alternative segmentation choice is to use fully character-level (CHAR) models (Lee et al, 2017;Cherry et al, 2018;Gupta et al, 2019;Gao et al, 2020;Banar et al, 2020), which has the potential to alleviate above issues. CHAR does not need to learn any segmentation rules and keeps all available information in the surface form, avoiding the risk of information loss due to improper segmentation.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

When is Char Better Than Subword: A Systematic Study of Segmentation Algorithms for Neural Machine Translation

Shen¹,

Huang²,

Dai³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Subword segmentation algorithms have been a de facto choice when building neural machine translation systems. However, most of them need to learn a segmentation model based on some heuristics, which may produce suboptimal segmentation. This can be problematic in some scenarios when the target language has rich morphological changes or there is not enough data for learning compact composition rules. Translating at fully character level has the potential to alleviate the issue, but empirical performances of character-based models has not been fully explored. In this paper, we present an in-depth comparison between character-based and subword-based NMT systems under three settings: translating to typologically diverse languages, training with low resource, and adapting to unseen domains. Experimental results show strong competitiveness of character-based models. Further analyses show that compared to subword-based models, character-based models are better at handling morphological phenomena, generating rare and unknown words, and more suitable for transferring to unseen domains.

show abstract

Section: Related Workmentioning

confidence: 99%