2018
DOI: 10.48550/arxiv.1809.02694
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Machine Translation of Logographic Languages Using Sub-character Level Information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…Plus, CJK languages do not contain white space as obvious word boundaries in the corpus (Moh and Zhang 2012). Researchers have attempted to mitigate these problems by borrowing information from a parallel corpus of another language, commonly in multilingual corpora for translation (Luo, Tinsley, and Lepage 2013;Zhang and Komachi 2018). Thanks to recent advances in neural networks and pretrained models like BERT (Devlin et al 2018), there has been progress in identifying CJK words that span multiple characters (Hiraoka, Shindo, and Matsumoto 2019).…”
Section: Tokenization For Various Natural Languagesmentioning
confidence: 99%
“…Plus, CJK languages do not contain white space as obvious word boundaries in the corpus (Moh and Zhang 2012). Researchers have attempted to mitigate these problems by borrowing information from a parallel corpus of another language, commonly in multilingual corpora for translation (Luo, Tinsley, and Lepage 2013;Zhang and Komachi 2018). Thanks to recent advances in neural networks and pretrained models like BERT (Devlin et al 2018), there has been progress in identifying CJK words that span multiple characters (Hiraoka, Shindo, and Matsumoto 2019).…”
Section: Tokenization For Various Natural Languagesmentioning
confidence: 99%