2017
DOI: 10.1162/tacl_a_00067
|View full text |Cite
|
Sign up to set email alerts
|

Fully Character-Level Neural Machine Translation without Explicit Segmentation

Abstract: Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularitie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
329
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 327 publications
(332 citation statements)
references
References 18 publications
3
329
0
Order By: Relevance
“…In NMT training, using subword units such as byte-pair encoding (Sennrich, Haddow, and Birch 2016) has become a de facto standard in training competition grade systems Pinnis et al 2017). A few have tried morpheme-based segmentation (Bradbury and Socher 2016), and several even used character-based systems (Chung, Cho, and Bengio 2016;Lee, Cho, and Hofmann 2017) to achieve similar performance as the BPE-segmented systems. Table 2 shows an example of each representation unit.…”
Section: Morphologymentioning
confidence: 99%
See 1 more Smart Citation
“…In NMT training, using subword units such as byte-pair encoding (Sennrich, Haddow, and Birch 2016) has become a de facto standard in training competition grade systems Pinnis et al 2017). A few have tried morpheme-based segmentation (Bradbury and Socher 2016), and several even used character-based systems (Chung, Cho, and Bengio 2016;Lee, Cho, and Hofmann 2017) to achieve similar performance as the BPE-segmented systems. Table 2 shows an example of each representation unit.…”
Section: Morphologymentioning
confidence: 99%
“…Chung, Cho, and Bengio (2016) used a combination of BPE-based encoder and character-based decoder to improve translation quality. Motivated by their findings, Lee, Cho, and Hofmann (2017) explored using fully character representations (with no word boundaries) on both the source and target sides. As BPE segmentation is not linguistically motivated, an alternative of using morpheme-based segmentation has been explored in Bradbury and Socher (2016).…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, the target language ID j is given to the model so that it knows to which language it translates, and this can be readily implemented by setting the initial token y 0 = j for the target sentence to start with. 1 The multilingual NMT model shares a single representation space across multiple languages, which has been found to facilitate translating low-resource language pairs (Firat et al, 2016a;Lee et al, 2016;Gu et al, 2018b,c).…”
Section: Introductionmentioning
confidence: 99%
“…Recently, deep learning (DL) has become a driver of many new real‐world applications ranging from language translation to computer vision . A DL architecture, U‐net, has been successfully applied to predict dose distributions for prostate cancer radiotherapy .…”
Section: Introductionmentioning
confidence: 99%