Tensor2Tensor for Neural Machine Translation

Vaswani, Ashish; Bengio, Samy; Brevdo, Eugene; Chollet, François; Gomez, Aidan N.; Gouws, Stephan; Jones, Llion; Kaiser, Łukasz; Kalchbrenner, Nal; Parmar, Niki; Sepassi, Ryan; Shazeer, Noam; Uszkoreit, Jakob

doi:10.48550/arxiv.1803.07416

Cited by 56 publications

(66 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2) Model: Base model Paraphrase identification is a wellstudied sentence pair modeling task and is very useful for many NLP applications such as machine translation(MT) [14], question answering(QA) [15] and information retrieval(IR) [16]. Many methods have been proposed for it in recent years including pairwise word interaction modeling with deep neural network system [17], character level neural network model [18], and pre-trained language model [2].…”

Section: ) Resultsmentioning

confidence: 99%

Billion-scale Pre-trained E-commerce Product Knowledge Graph Model

Zhang

Wong

et al. 2021

Preprint

View full text Add to dashboard Cite

In recent years, knowledge graphs have been widely applied to organize data in a uniform way and enhance many tasks that require knowledge, for example, online shopping which has greatly facilitated people's life. As a backbone for online shopping platforms, we built a billion-scale e-commerce product knowledge graph for various item knowledge services such as item recommendation. However, such knowledge services usually include tedious data selection and model design for knowledge infusion, which might bring inappropriate results. Thus, to avoid this problem, we propose a Pre-trained Knowledge Graph Model (PKGM) for our billion-scale e-commerce product knowledge graph, providing item knowledge services in a uniform way for embedding-based models without accessing triple data in the knowledge graph. Notably, PKGM could also complete knowledge graphs during servicing, thereby overcoming the common incompleteness issue in knowledge graphs. We test PKGM in three knowledge-related tasks including item classification, same item identification, and recommendation. Experimental results show PKGM successfully improves the performance of each task.

show abstract

Section: ) Resultsmentioning

confidence: 99%

Billion-scale Pre-trained E-commerce Product Knowledge Graph Model

Zhang

Wong

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For all investigated approaches including non-fusion baselines Baseline-mag and Baseline-phase, we use beam-search decoding during inference and slightly deviate from the standard transformer architecture in [6] by using layer normalization before each attention or stack of fully connected layers according to the implementation in [33]. During decoding, the final output P final ℓ of all approaches can optionally be computed as…”

Section: Language Model and Decodingmentioning

confidence: 99%

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Lohrenz¹,

Li²,

Fingscheidt³

2021

Preprint

View full text Add to dashboard Cite

Stream fusion, also known as system combination, is a common technique in automatic speech recognition for traditional hybrid hidden Markov model approaches, yet mostly unexplored for modern deep neural network end-to-end model architectures. Here, we investigate various fusion techniques for the all-attention-based encoder-decoder architecture known as the transformer, striving to achieve optimal fusion by investigating different fusion levels in an example single-microphone setting with fusion of standard magnitude and phase features. We introduce a novel multi-encoder learning method that performs a weighted combination of two encoder-decoder multi-head attention outputs only during training. Employing then only the magnitude feature encoder in inference, we are able to show consistent improvement on Wall Street Journal (WSJ) with language model and on Librispeech, without increase in runtime or parameters. Combining two such multi-encoder trained models by a simple late fusion in inference, we achieve state-of-the-art performance for transformer-based models on WSJ with a significant WER reduction of 19% relative compared to the current benchmark approach.

show abstract

“…In this study, we propose a novel method ComFormer via Transformer [13] and fusion method-based hybrid code representation. Our method considers Transformer since this deep learning model can achieve better performance than traditional sequence to sequence models in classical natural language processing (NLP) tasks (such as neural machine translation [14] [15] and software engineering [16]). Moreover, our method also utilizes the hybrid code representation to effectively learn the semantic of the code since this representation can extract both lexical-level and syntactic-level information from the code, respectively.…”

Section: Introductionmentioning

confidence: 99%

ComFormer: Code Comment Generation via Transformer and Fusion Method-based Hybrid Code Representation

Yang¹,

Chen²,

Cao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Developers often write low-quality code comments due to the lack of programming experience, which can reduce the efficiency of developers' program comprehension. Therefore, developers hope that code comment generation tools can be developed to illustrate the functionality and purpose of the code. Recently, researchers mainly model this problem as the neural machine translation problem and tend to use deep learning-based methods. In this study, we propose a novel method ComFormer based on Transformer and fusion method-based hybrid code presentation. Moreover, to alleviate OOV (out-of-vocabulary) problem and speed up model training, we further utilize the Byte-BPE algorithm to split identifiers and Sim SBT method to perform AST Traversal. We compare ComFormer with seven state-of-the-art baselines from code comment generation and neural machine translation domains. Comparison results show the competitiveness of ComFormer in terms of three performance measures. Moreover, we perform a human study to verify that ComFormer can generate high-quality comments.

show abstract

Tensor2Tensor for Neural Machine Translation

Cited by 56 publications

References 6 publications

Billion-scale Pre-trained E-commerce Product Knowledge Graph Model

Billion-scale Pre-trained E-commerce Product Knowledge Graph Model

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

ComFormer: Code Comment Generation via Transformer and Fusion Method-based Hybrid Code Representation

Contact Info

Product

Resources

About