Zewei Sun scite author profile

Zewei Sun

5Publications

37Citation Statements Received

51Citation Statements Given

How they've been cited

How they cite others

Affiliations

Nanjing University

Publications

Order By: Most citations

Generating Diverse Translation by Manipulating Multi-Head Attention

Sun

Huang

Wei

et al. 2020

AAAI

View full text Add to dashboard Cite

Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a significant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.

show abstract

Multilingual Translation via Grafting Pre-trained Language Models

Sun

Wang

2021

View full text Add to dashboard Cite

Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pretrained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size 1 .

show abstract

Generating Diverse Translation by Manipulating Multi-Head Attention

Sun

Huang

Wei

et al. 2019

Preprint

View full text Add to dashboard Cite

Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-theart results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a significant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.

show abstract

NJUNLP’s Machine Translation System for CCMT-2020 Uighur $$\rightarrow $$ Chinese Translation Task

Wang

Liu

Jiang

et al. 2020

View full text Add to dashboard Cite

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Sun¹,

Huang²,

Dai³

et al. 2020

Preprint

View full text Add to dashboard Cite

Recent studies show that the attention heads in Transformer are not equal Michel et al., 2019). We relate this phenomenon to the imbalance training of multihead attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zewei Sun

Generating Diverse Translation by Manipulating Multi-Head Attention

Multilingual Translation via Grafting Pre-trained Language Models

Generating Diverse Translation by Manipulating Multi-Head Attention

NJUNLP’s Machine Translation System for CCMT-2020 Uighur $$\rightarrow $$ Chinese Translation Task

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Contact Info

Product

Resources

About