SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Wang, Bin; Kuo, C.-C. Jay

doi:10.48550/arxiv.2002.06652

Cited by 10 publications

(13 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The probing tasks examine linguistic information at the surface-level (how well embeddings encode surface knowledge that does not require linguistic information); the syntactic-level (how well the embeddings encode the grammatical structure of a sentence; and the semantic-level (how well the embeddings encode the meaning and logistics behind the sentences). For evaluating on SentEval, we use the scripts provided by SBERT-WK (Wang and Kuo 2020). We use the "CLS" embedding method for BERT-base and XLNetbase, while "Ave last hidden" for the SBERT-base model and replicate the results with the original paper.…”

Section: Methodsmentioning

confidence: 99%

Transferring Semantic Knowledge Into Language Encoders

Umair¹,

Ferraro²

2021

Preprint

View full text Add to dashboard Cite

We introduce semantic form mid-tuning, an approach for transferring semantic knowledge from semantic meaning representations into transformer-based language encoders. In mid-tuning, we learn to align the text of general sentencesnot tied to any particular inference task-and structured semantic representations of those sentences. Our approach does not require gold annotated semantic representations. Instead, it makes use of automatically generated semantic representations, such as from off-the-shelf PropBank and FrameNet semantic parsers. We show that this alignment can be learned implicitly via classification or directly via triplet loss. Our method yields language encoders that demonstrate improved predictive performance across inference, reading comprehension, textual similarity, and other semantic tasks drawn from the GLUE, SuperGLUE, and SentEval benchmarks. We evaluate our approach on three popular baseline models, where our experimental results and analysis concludes that current pre-trained language models can further benefit from structured semantic frames with the proposed mid-tuning method, as they inject additional task-agnostic knowledge to the encoder, improving the generated embeddings as well as the linguistic properties of the given model, as evident from improvements on a popular sentence embedding toolkit and a variety of probing tasks.

show abstract

Section: Methodsmentioning

confidence: 99%

Transferring Semantic Knowledge Into Language Encoders

Umair¹,

Ferraro²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Moreover, we customize the NT-Xent loss (Chen et al, 2020), a contrastive learning objective widely used in computer vision, for better sentence representation learning with BERT. We demonstrate that our approach outperforms competitive baselines designed for building BERT sentence vectors (Li et al, 2020;Wang and Kuo, 2020) in various environments. With comprehensive analyses, we also show that our method is more computationally efficient than the baselines at inference in addition to being more robust to domain shifts.…”

Section: Introductionmentioning

confidence: 93%

“…Meanwhile, some other studies concentrate on more effectively leveraging the knowledge embedded in BERT to construct sentence embeddings without supervision. Specifically, Wang and Kuo (2020) propose a pooling method based on linear algebraic algorithms to draw sentence vectors from BERT's intermediate layers. Li et al (2020) suggest to learn a mapping from the average of the embeddings obtained from the last two layers of BERT to a spherical Gaussian distribution using a flow model, and to leverage the redistributed embeddings in place of the original BERT representations.…”

Section: Related Workmentioning

confidence: 99%

“…• Mean pooling: This method conducts mean pooling on the last layer of BERT and use the output as a sentence embedding. • WK pooling: This follows the method of Wang and Kuo (2020), which exploits QR decomposition and extra techniques to derive meaningful sentence vectors from BERT. • Flow: This is BERT-flow proposed by Li et al (2020), which is a flow-based model that maps the vectors made by taking mean pooling on the last two layers of BERT to a Gaussian space.…”

Section: Semantic Textual Similarity Tasksmentioning

confidence: 99%

See 1 more Smart Citation

Self-Guided Contrastive Learning for BERT Sentence Representations

Kim

Yoo

Lee

2021

Preprint

View full text Add to dashboard Cite

Although BERT and its variants have reshaped the NLP landscape, it still remains unclear how best to derive sentence embeddings from such pre-trained Transformers. In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations. Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation, and enables the usual [CLS] token embeddings to function as sentence vectors. Moreover, we redesign the contrastive learning objective (NT-Xent) and apply it to sentence representation learning. We demonstrate with extensive experiments that our approach is more effective than competitive baselines on diverse sentence-related tasks. We also show it is efficient at inference and robust to domain shifts.

show abstract

“…For embedding into Euclidean spaces, a large body of work is based on Word2vec (Mikolov et al, 2013), where each word is represented as a vector in the Euclidean space. From these word embeddings one can further compute document and sentence embeddings using various models Ramos et al (2003), Arora et al (2017), Wang and Kuo (2020), Le and Mikolov (2014), Kiros et al (2015), Logeswaran and Lee (2018) for higher level NLP tasks.…”

Section: Introductionmentioning

confidence: 99%

Interpretable contrastive word mover's embedding

Jiang¹,

Gouvea²,

Miller³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper shows that a popular approach to the supervised embedding of documents for classification, namely, contrastive Word Mover's Embedding, can be significantly enhanced by adding interpretability. This interpretability is achieved by incorporating a clustering promoting mechanism into the contrastive loss. On several public datasets, we show that our method improves significantly upon existing baselines while providing interpretation to the clusters via identifying a set of keywords that are the most representative of a particular class. Our approach was motivated in part by the need to develop Natural Language Processing (NLP) methods for the novel problem of assessing student work for scientific writing and thinking -a problem that is central to the area of (educational) Learning Sciences (LS). In this context, we show that our approach leads to a meaningful assessment of the student work related to lab reports from a biology class and can help LS researchers gain insights into student understanding and assess evidence of scientific thought processes.

show abstract

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Cited by 10 publications

References 50 publications

Transferring Semantic Knowledge Into Language Encoders

Transferring Semantic Knowledge Into Language Encoders

Self-Guided Contrastive Learning for BERT Sentence Representations

Interpretable contrastive word mover's embedding

Contact Info

Product

Resources

About