Whitening Sentence Representations for Better Semantics and Faster Retrieval

Su, Jianlin; Cao, Jiarun; Liu, Weijie; Ou, Yangyiwen

doi:10.48550/arxiv.2103.15316

Cited by 75 publications

(85 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Li et al (2020) improves the embedding space to be isotropic via normalizing flows. The whitening operation is another alternative operation to improve the isotropy of the embedding space (Su et al, 2021). It is typical to initialize such models with a pre-trained language model (Devlin et al, 2019) before training on NLI datasets.…”

Section: Related Workmentioning

confidence: 99%

Text and Code Embeddings by Contrastive Pre-Training

Neelakantan¹,

Xu²,

Puri³

et al. 2022

Preprint

View full text Add to dashboard Cite

Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text embeddings that achieve new state-of-the-art results in linear-probe classification also display impressive semantic search capabilities and sometimes even perform competitively with fine-tuned models. On linear-probe classification accuracy averaging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respectively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, obtaining a 20.8% relative improvement over prior best work on code search.

show abstract

Section: Related Workmentioning

confidence: 99%

Text and Code Embeddings by Contrastive Pre-Training

Neelakantan¹,

Xu²,

Puri³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…But the [CLS] embedding is non-smooth anisotropic in semantic space, which is not conducive to STS tasks, this is known as the representation degradation problem (Gao et al, 2019). BERT-Flow (Li et al, 2020) and BERT-whitening (Su et al, 2021) solve the degradation problem by post-processing the output of BERT. SimCSE found that utilizing contrasting mechanism can also alleviate this problem.…”

Section: Related Workmentioning

confidence: 99%

“…In both BERT-whitening (Su et al, 2021) and MoCo , it is mentioned that the dimension of embedding can have some impact on the performance of the model. Therefore, we also changed the dimension of sentence embedding in MoCoSE and trained the model several times to observe the impact of the embedding dimension.…”

Section: A8 Dimension Of Sentence Embeddingmentioning

confidence: 99%

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

Cao¹,

Wang²,

Liang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Contrastive learning is emerging as a powerful technique for extracting knowledge from unlabeled data. This technique requires a balanced mixture of two ingredients: positive (similar) and negative (dissimilar) samples. This is typically achieved by maintaining a queue of negative samples during training. Prior works in the area typically uses a fixed-length negative sample queue, but how the negative sample size affects the model performance remains unclear. The opaque impact of the number of negative samples on performance when employing contrastive learning aroused our in-depth exploration. This paper presents a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE. We add the prediction layer to the online branch to make the model asymmetric and together with EMA update mechanism of the target branch to prevent model from collapsing. We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples. Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue. We evaluate the proposed unsupervised MoCoSE on the semantic text similarity (STS) task and obtain an average Spearman's correlation of 77.27%. Source code is available here.

show abstract

“…The encoder can be BERT (Devlin et al, 2019), RoBERTa , etc. The pooling type could be selected from CLS, Last-Avg, First-Last-Avg, Last2-Avg (Su et al, 2021). The multi-persp linears consists of N dense linear layers, which convert the single sentence embedding into N normalized embeddings corresponding to the N perspectives.…”

Section: Bi-encodermentioning

confidence: 99%

Semantic Matching from Different Perspectives

Liu¹,

Zhu²,

Mao³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we pay attention to the issue which is usually overlooked, i.e., similarity should be determined from different perspectives. To explore this issue, we release a Multi-Perspective Text Similarity (MPTS) dataset, in which sentence similarities are labeled from twelve perspectives. Furthermore, we conduct a series of experimental analysis on this task by retrofitting some famous text matching models. Finally, we obtain several conclusions and baseline models, laying the foundation for the following investigation of this issue. The dataset and code are publicly available at Github 1 .

show abstract

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Cited by 75 publications

References 16 publications

Text and Code Embeddings by Contrastive Pre-Training

Text and Code Embeddings by Contrastive Pre-Training

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

Semantic Matching from Different Perspectives

Contact Info

Product

Resources

About