Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/746
|View full text |Cite
|
Sign up to set email alerts
|

Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax

Abstract: In this paper, we present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. The embeddings are able to achieve state-of-the-art results on the United Nations (UN) parallel corpus retrieval task. In all the languages tested, the system achieves P@1 of 86% or higher. We use pairs retrieved by our approach to train NMT models that achieve similar performance to models trained on gold pairs. We explore simple document-level embeddings constructe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
94
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 68 publications
(95 citation statements)
references
References 1 publication
1
94
0
Order By: Relevance
“…Dual encoder models are learned functions that collocate queries and results in a shared embedding space. This architecture has shown strong performance on sentence-level retrieval tasks, in-cluding conversational response retrieval (Henderson et al, 2017;, translation pair retrieval (Guo et al, 2018;Yang et al, 2019b) and similar text retrieval (Gillick et al, 2018). A dual encoder for use with ReQA has the schematic shape illustrated in Figure 3.…”
Section: Neural Baselinesmentioning
confidence: 99%
“…Dual encoder models are learned functions that collocate queries and results in a shared embedding space. This architecture has shown strong performance on sentence-level retrieval tasks, in-cluding conversational response retrieval (Henderson et al, 2017;, translation pair retrieval (Guo et al, 2018;Yang et al, 2019b) and similar text retrieval (Gillick et al, 2018). A dual encoder for use with ReQA has the schematic shape illustrated in Figure 3.…”
Section: Neural Baselinesmentioning
confidence: 99%
“…The triplet loss (3) used previously misses opportunities to learn against a wider set of negative examples, namely all those in the batch that are not known to be positively associated (i.e., M ij = 1). To exploit these additional negatives, we minimize the Masked Margin Softmax (MMS) loss function, inspired by Henderson et al (2017) and Yang et al (2019). MMS simulates x-to-y and y-to-x retrievals inside the batch.…”
Section: Modelmentioning
confidence: 99%
“…Contrasting Equations 3 and 4, the former chooses a negative sample randomly, while the latter takes advantage of all negative pairs in the batch and thus improves sample efficiency. L MMS has three main differences with Yang et al (2019): (1) a masking term that accounts for the fact that there might be multiple positive choices in the batch for a given input; (2) a varying margin term δ, which is increased during training; (3) a log term that makes MMS more closely resemble a cross-entropy loss. Optimization.…”
Section: Modelmentioning
confidence: 99%
“…However, these sentence embedding methods have had limited success when applied to documentlevel mining tasks (Guo et al, 2018). A recent study from Yang et al (2019) shows that document embeddings obtained from averaging sentence embeddings can achieve state-of-the-art performance in document retrieval on the United Nation (UN) corpus. This simple averaging approach, however, heavily relies on high quality sentence embeddings and the cleanliness of documents in the application domain.…”
Section: Introductionmentioning
confidence: 99%
“…In our work, we explore using three variants of document-level embeddings for parallel document mining: (i) simple averaging of embeddings from a multilingual sentence embedding model (Yang et al, 2019); (ii) trained document-level embeddings based on document unigrams; (iii) a simple hierarchical document encoder (HiDE) trained on documents pairs using the output of our sentencelevel model.…”
Section: Introductionmentioning
confidence: 99%