Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) 2019
DOI: 10.18653/v1/w19-4305
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual NMT with a Language-Independent Attention Bridge

Abstract: In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate attention bridge that is shared across all languages. That is, we train the model with language-specific encoders and decoders that are connected via self-attention with a shared layer that we call attention bridge. This layer exploits the semantics from each language for performing translation and develops into a language-independent meaning… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
35
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(36 citation statements)
references
References 30 publications
1
35
0
Order By: Relevance
“…Hence, the decoder's attention mechanism sees a variable number of encoder representations for equivalent sentences across languages. To overcome this, an attention bridge network generates a fixed number of contextual representations that are input to the attention network [94,149]. By minimizing the diversity of representations, the decoder's task is simplified and it becomes better at language generation.…”
Section: Addressing Language Divergencementioning
confidence: 99%
“…Hence, the decoder's attention mechanism sees a variable number of encoder representations for equivalent sentences across languages. To overcome this, an attention bridge network generates a fixed number of contextual representations that are input to the attention network [94,149]. By minimizing the diversity of representations, the decoder's task is simplified and it becomes better at language generation.…”
Section: Addressing Language Divergencementioning
confidence: 99%
“…the i-th column of I emb represents a initial semantic subspace that guides what semantic information of the H enc should be attended to at the corresponding position i of the Interlingua output. The r means every Encoder H enc will be mapped into a fixed size representation of r hidden states, and it is set to 10 during all of our experiments, similar to the work of (Vázquez et al, 2018). By incorporating a shared interlingua embedding, we expect that it can exploit the semantics of various subspaces from encoded representation, and the same semantic components of different sentences from both same and different languages should be mapped into the same posi-…”
Section: Interlinguamentioning
confidence: 99%
“…During all our experiments, we follow the settings of TRANS-FORMER-base (Vaswani et al, 2017) with hidden/embedding size 512, 6 hidden layers and 8 attention heads. We set 3 layers for Interlingua, and r = 10 similar to the work of (Vázquez et al, 2018). We apply sub-word NMT (Sennrich et al, 2015), where a joint BPE model is trained for all languages with 50,000 operations.…”
Section: Experimental Settingsmentioning
confidence: 99%
See 1 more Smart Citation
“…The model-centric approaches, on the other hand, center on adjusting the training objectives (Wang et al, 2017;Tan et al, 2019b); modifying the model architectures (Vázquez et al, 2019;Dou et al, 2019a); and tweaking the decoding procedure (Hasler et al, 2018;Dou et al, 2019b).…”
Section: Related Workmentioning
confidence: 99%