Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

Poerner, Nina; Waltinger, Ulli; Schütze, Hinrich

doi:10.18653/v1/2020.acl-main.628

Cited by 19 publications

(21 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, we expect that word prisms can improve performance in other tasks such as automatic summarization, which often use a single set of word embeddings in their input layers (Dong et al, 2019). Third, we believe that meta-embeddings and the method behind word prisms can be generalized past word-based representations to sentence representations (Pagliardini et al, 2018) and may improve their quality, as was recently demonstrated by Poerner et al (2019). Lastly, recent work has found simple word embeddings to be useful for solving diverse problems from the medical domain (Zhang et al, 2019), to materials science (Tshitoyan et al, 2019), to law (Chalkidis and Kampas, 2019); we expect that word prisms and their motivations can further improve results in these applications.…”

Section: Discussionmentioning

confidence: 84%

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Tsiolis

Kenyon-Dean

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, semantic, etc.) depending on the notion of context defined at training time. These properties manifest when querying the embedding space for the most similar vectors, and when used at the input layer of deep neural networks trained to solve downstream NLP problems. Meta-embeddings combine multiple sets of differently trained word embeddings, and have been shown to successfully improve intrinsic and extrinsic performance over equivalent models which use just one set of source embeddings. We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand. Word prisms learn orthogonal transformations to linearly combine the input source embeddings, which allows them to be very efficient at inference time. We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements in performance on all tasks. 1 * Equal contribution. † This work was pursued prior to Kian's employment at BMO.

show abstract

Section: Discussionmentioning

confidence: 84%

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Tsiolis

Kenyon-Dean

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…There are many unsupervised approaches to obtaining sentence embeddings, for example by averaging word embeddings (Mikolov et al, 2013;Pennington et al, 2014;Bojanowski et al, 2017) or with carefully designed sentence-level objectives (Le and Mikolov, 2014;Kiros et al, 2015). Ensembling several methods improves results (Pörner and Schütze, 2019;Pörner et al, 2020). Recent work obtains sentence representations by supplementing BERT (Devlin et al, 2019) or other PLMs with additional unsupervised objectives (Zhang et al, 2020;Wu et al, 2020;Giorgi et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Generating Datasets with Pretrained Language Models

Schick¹,

Schütze²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how PLMs can be leveraged to obtain high-quality sentence embeddings without the need for labeled data, finetuning or modifications to the pretraining objective: We utilize the generative abilities of large and high-performing PLMs to generate entire datasets of labeled text pairs from scratch, which we then use for finetuning much smaller and more efficient models. Our fully unsupervised approach outperforms strong baselines on several semantic textual similarity datasets. 1

show abstract

“…For the combination, some alternatives have been proposed, such as different input channels of a convolutional neural network (Kim, 2014;Zhang et al, 2016), concatenation followed by dimensionality reduction (Yin and Schütze, 2016) or averaging of embeddings (Coates and Bollegala, 2018), e.g., for combining embeddings from multiple languages (Lange et al, 2020b;Reid et al, 2020). More recently, auto-encoders (Bollegala and Bao, 2018;Wu et al, 2020), ensembles of sentence encoders (Poerner et al, 2020) and attentionbased methods (Kiela et al, 2018;Lange et al, 2019a) have been introduced. The latter allows a dynamic (input-based) combination of multiple embeddings.…”

Section: Related Workmentioning

confidence: 99%

FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations

Lange¹,

Adel²,

Strötgen³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Combining several embeddings typically improves performance in downstream tasks as different embeddings encode different information. It has been shown that even models using embeddings from transformers still benefit from the inclusion of standard word embeddings. However, the combination of embeddings of different types and dimensions is challenging. As an alternative to attention-based meta-embeddings, we propose feature-based adversarial meta-embeddings (FAME) with an attention function that is guided by features reflecting word-specific properties, such as shape and frequency, and show that this is beneficial to handle subword-based embeddings. In addition, FAME uses adversarial training to optimize the mappings of differently-sized embeddings to the same space. We demonstrate that FAME works effectively across languages and domains for sequence labeling and sentence classification, in particular in lowresource settings. FAME sets the new state of the art for POS tagging in 27 languages, various NER settings and question classification in different domains.

show abstract

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

Cited by 19 publications

References 22 publications

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Generating Datasets with Pretrained Language Models

FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations

Contact Info

Product

Resources

About