2021
DOI: 10.1145/3431727
|View full text |Cite
|
Sign up to set email alerts
|

Using Sub-character Level Information for Neural Machine Translation of Logographic Languages

Abstract: Logographic and alphabetic languages (e.g., Chinese vs. English) have different writing systems linguistically. Languages belonging to the same writing system usually exhibit more sharing information, which can be used to facilitate natural language processing tasks such as neural machine translation (NMT). This article takes advantage of the logographic characters in Chinese and Japanese by decomposing them into smaller units, thus more optimally utilizing the information these characters share in the trainin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 15 publications
1
1
0
Order By: Relevance
“…BPE denotes a subword based NMT with a joint vocabulary of 32K BPE tokens † † . Different to NMT for alphabetic languages, we can observe that the BPE approach in Japanese-Chinese NMT is inferior to the baseline using characters, which is consistent with the conclusions in [12], [13]. One reason is that the words in logographic languages are far shorter than words in alphabetic languages, which causes BPE failing to solve the low frequency words' problem.…”
Section: Resultssupporting
confidence: 76%
“…BPE denotes a subword based NMT with a joint vocabulary of 32K BPE tokens † † . Different to NMT for alphabetic languages, we can observe that the BPE approach in Japanese-Chinese NMT is inferior to the baseline using characters, which is consistent with the conclusions in [12], [13]. One reason is that the words in logographic languages are far shorter than words in alphabetic languages, which causes BPE failing to solve the low frequency words' problem.…”
Section: Resultssupporting
confidence: 76%
“…On the other hand, language differences will also affect the political, economic, cultural, and military exchanges between countries. In recent years, the contacts between China and Japan have become more and more frequent and close, and doing a good job in Japanese translation has an important impact on the exchanges between the two countries [1][2][3][4].…”
Section: Introductionmentioning
confidence: 99%