2021
DOI: 10.48550/arxiv.2103.15316
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Abstract: Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
85
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 75 publications
(85 citation statements)
references
References 16 publications
0
85
0
Order By: Relevance
“…Li et al (2020) improves the embedding space to be isotropic via normalizing flows. The whitening operation is another alternative operation to improve the isotropy of the embedding space (Su et al, 2021). It is typical to initialize such models with a pre-trained language model (Devlin et al, 2019) before training on NLI datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Li et al (2020) improves the embedding space to be isotropic via normalizing flows. The whitening operation is another alternative operation to improve the isotropy of the embedding space (Su et al, 2021). It is typical to initialize such models with a pre-trained language model (Devlin et al, 2019) before training on NLI datasets.…”
Section: Related Workmentioning
confidence: 99%
“…But the [CLS] embedding is non-smooth anisotropic in semantic space, which is not conducive to STS tasks, this is known as the representation degradation problem (Gao et al, 2019). BERT-Flow (Li et al, 2020) and BERT-whitening (Su et al, 2021) solve the degradation problem by post-processing the output of BERT. SimCSE found that utilizing contrasting mechanism can also alleviate this problem.…”
Section: Related Workmentioning
confidence: 99%
“…In both BERT-whitening (Su et al, 2021) and MoCo , it is mentioned that the dimension of embedding can have some impact on the performance of the model. Therefore, we also changed the dimension of sentence embedding in MoCoSE and trained the model several times to observe the impact of the embedding dimension.…”
Section: A8 Dimension Of Sentence Embeddingmentioning
confidence: 99%
“…The encoder can be BERT (Devlin et al, 2019), RoBERTa , etc. The pooling type could be selected from CLS, Last-Avg, First-Last-Avg, Last2-Avg (Su et al, 2021). The multi-persp linears consists of N dense linear layers, which convert the single sentence embedding into N normalized embeddings corresponding to the N perspectives.…”
Section: Bi-encodermentioning
confidence: 99%