Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop 2020
DOI: 10.18653/v1/2020.acl-srw.11
|View full text |Cite
|
Sign up to set email alerts
|

Zero-shot North Korean to English Neural Machine Translation by Character Tokenization and Phoneme Decomposition

Abstract: The primary limitation of North Korean to English translation is the lack of a parallel corpus; therefore, high translation accuracy cannot be achieved. To address this problem, we propose a zero-shot approach using South Korean data, which are remarkably similar to North Korean data. We train a neural machine translation model after tokenizing a South Korean text at the character level and decomposing characters into phonemes. We demonstrate that our method can effectively learn North Korean to English transl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…In relation to Korean in this regard, the concept of detailed segmentation intrigued many researchers in NLP in general (Park et al 2018;Kim et al 2020;Yongseok and Lee 2020;Park et al 2020), for a Korean word had a large volume of affixes and morphemes. In MT in specific, various combinations of token units were suggested.…”
Section: Related Workmentioning
confidence: 99%
“…In relation to Korean in this regard, the concept of detailed segmentation intrigued many researchers in NLP in general (Park et al 2018;Kim et al 2020;Yongseok and Lee 2020;Park et al 2020), for a Korean word had a large volume of affixes and morphemes. In MT in specific, various combinations of token units were suggested.…”
Section: Related Workmentioning
confidence: 99%