2018
DOI: 10.48550/arxiv.1803.07416
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tensor2Tensor for Neural Machine Translation

Abstract: Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model. Neural Machine Translation BackgroundMachine translation using deep neural networks achieved great success with sequence-tosequence models (Sutskever et al., 2014;Bahdanau et al., 2014;Cho et al., 2014) that used recurrent neural networks (RNNs) with LSTM cells (Hochreiter and Schmidhuber, 1997). The basic sequence-to-sequenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
66
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 56 publications
(66 citation statements)
references
References 6 publications
0
66
0
Order By: Relevance
“…2) Model: Base model Paraphrase identification is a wellstudied sentence pair modeling task and is very useful for many NLP applications such as machine translation(MT) [14], question answering(QA) [15] and information retrieval(IR) [16]. Many methods have been proposed for it in recent years including pairwise word interaction modeling with deep neural network system [17], character level neural network model [18], and pre-trained language model [2].…”
Section: ) Resultsmentioning
confidence: 99%
“…2) Model: Base model Paraphrase identification is a wellstudied sentence pair modeling task and is very useful for many NLP applications such as machine translation(MT) [14], question answering(QA) [15] and information retrieval(IR) [16]. Many methods have been proposed for it in recent years including pairwise word interaction modeling with deep neural network system [17], character level neural network model [18], and pre-trained language model [2].…”
Section: ) Resultsmentioning
confidence: 99%
“…For all investigated approaches including non-fusion baselines Baseline-mag and Baseline-phase, we use beam-search decoding during inference and slightly deviate from the standard transformer architecture in [6] by using layer normalization before each attention or stack of fully connected layers according to the implementation in [33]. During decoding, the final output P final ℓ of all approaches can optionally be computed as…”
Section: Language Model and Decodingmentioning
confidence: 99%
“…In this study, we propose a novel method ComFormer via Transformer [13] and fusion method-based hybrid code representation. Our method considers Transformer since this deep learning model can achieve better performance than traditional sequence to sequence models in classical natural language processing (NLP) tasks (such as neural machine translation [14] [15] and software engineering [16]). Moreover, our method also utilizes the hybrid code representation to effectively learn the semantic of the code since this representation can extract both lexical-level and syntactic-level information from the code, respectively.…”
Section: Introductionmentioning
confidence: 99%