Fully Character-Level Neural Machine Translation without Explicit                     Segmentation

Lee, Jason; Cho, Kyunghyun; Hofmann, Thomas

doi:10.1162/tacl_a_00067

Cited by 327 publications

(332 citation statements)

References 18 publications

Supporting

Mentioning

329

Contrasting

Order By: Relevance

“…In NMT training, using subword units such as byte-pair encoding (Sennrich, Haddow, and Birch 2016) has become a de facto standard in training competition grade systems Pinnis et al 2017). A few have tried morpheme-based segmentation (Bradbury and Socher 2016), and several even used character-based systems (Chung, Cho, and Bengio 2016;Lee, Cho, and Hofmann 2017) to achieve similar performance as the BPE-segmented systems. Table 2 shows an example of each representation unit.…”

Section: Morphologymentioning

confidence: 99%

“…Chung, Cho, and Bengio (2016) used a combination of BPE-based encoder and character-based decoder to improve translation quality. Motivated by their findings, Lee, Cho, and Hofmann (2017) explored using fully character representations (with no word boundaries) on both the source and target sides. As BPE segmentation is not linguistically motivated, an alternative of using morpheme-based segmentation has been explored in Bradbury and Socher (2016).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Linguistic Representational Power of Neural Machine Translation Models

Belinkov

Durrani²,

Dalvi³

et al. 2020

Computational Linguistics

View full text Add to dashboard Cite

Despite the recent success of deep neural networks in natural language processing (NLP) and other spheres of artificial intelligence (AI), their interpretability remains a challenge. We analyze the representations learned by neural machine translation (NMT) models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word-structure captured within the learned representations, which is an important aspect in translating morphologically-rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Notable findings include the following observations: i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers of the model; (iii) Representations learned using characters are more informed about wordmorphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

show abstract

Section: Morphologymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On the Linguistic Representational Power of Neural Machine Translation Models

Belinkov

Durrani²,

Dalvi³

et al. 2020

Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Specifically, the target language ID j is given to the model so that it knows to which language it translates, and this can be readily implemented by setting the initial token y 0 = j for the target sentence to start with. 1 The multilingual NMT model shares a single representation space across multiple languages, which has been found to facilitate translating low-resource language pairs (Firat et al, 2016a;Lee et al, 2016;Gu et al, 2018b,c).…”

Section: Introductionmentioning

confidence: 99%

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Gu¹,

Wang²,

Cho³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naïve training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) backtranslation. These methods show significant improvement (4 ∼ 22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.

show abstract

“…Recently, deep learning (DL) has become a driver of many new real‐world applications ranging from language translation to computer vision . A DL architecture, U‐net, has been successfully applied to predict dose distributions for prostate cancer radiotherapy .…”

Section: Introductionmentioning

confidence: 99%

Technical Note: A feasibility study on deep learning‐based radiotherapy dose calculation

Xing

Nguyen

et al. 2019

Medical Physics

View full text Add to dashboard Cite

Purpose Various dose calculation algorithms are available for radiation therapy for cancer patients. However, these algorithms are faced with the tradeoff between efficiency and accuracy. The fast algorithms are generally less accurate, while the accurate dose engines are often time consuming. In this work, we try to resolve this dilemma by exploring deep learning (DL) for dose calculation. Methods We developed a new radiotherapy dose calculation engine based on a modified Hierarchically Densely Connected U‐net (HD U‐net) model and tested its feasibility with prostate intensity‐modulated radiation therapy (IMRT) cases. Mapping from an IMRT fluence map domain to a three‐dimensional (3D) dose domain requires a deep neural network of complicated architecture and a huge training dataset. To solve this problem, we first project the fluence maps to the dose domain using a broad beam ray‐tracing (RT) algorithm, and then we use the HD U‐net to map the RT dose distribution into an accurate dose distribution calculated using a collapsed cone convolution/superposition (CS) algorithm. The model is trained on 70 patients with fivefold cross validation, and tested on a separate 8 patients. Results It takes about 1 s to compute a 3D dose distribution for a typical 7‐field prostate IMRT plan, which can be further reduced to achieve real‐time dose calculation by optimizing the network. The average Gamma passing rate between DL and CS dose distributions for the 8 test patients are 98.5% (±1.6%) at 1 mm/1% and 99.9% (±0.1%) at 2 mm/2%. For comparison of various clinical evaluation criteria (dose‐volume points) for IMRT plans between two dose distributions, the average difference for dose criteria is less than 0.25 Gy while for volume criteria is <0.16%, showing that the DL dose distributions are clinically identical to the CS dose distributions. Conclusions We have shown the feasibility of using DL for calculating radiotherapy dose distribution with high accuracy and efficiency.

show abstract

Fully Character-Level Neural Machine Translation without Explicit Segmentation

Cited by 327 publications

References 18 publications

On the Linguistic Representational Power of Neural Machine Translation Models

On the Linguistic Representational Power of Neural Machine Translation Models

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Technical Note: A feasibility study on deep learning‐based radiotherapy dose calculation

Contact Info

Product

Resources

About