Laure Thompson scite author profile

Laure Thompson

5Publications

194Citation Statements Received

86Citation Statements Given

How they've been cited

229

190

How they cite others

115

Affiliations

University of Massachusetts Amherst, Cornell University, Microsoft (United States)

Publications

Order By: Most citations

The strange geometry of skip-gram with negative sampling

Mimno

Thompson

2017

View full text Add to dashboard Cite

Despite their ubiquity, word embeddings trained with skip-gram negative sampling (SGNS) remain poorly understood. We find that vector positions are not simply determined by semantic similarity, but rather occupy a narrow cone, diametrically opposed to the context vectors. We show that this geometric concentration depends on the ratio of positive to negative examples, and that it is neither theoretically nor empirically inherent in related embedding algorithms.

show abstract

A Coalgebraic Decision Procedure for NetKAT

Foster

Kozen

Milano

et al. 2015

View full text Add to dashboard Cite

Program equivalence is a fundamental problem that has practical applications across a variety of areas of computing including compilation, optimization, software synthesis, formal verification, and many others. Equivalence is undecidable in general, but in certain settings it is possible to develop domain-specific languages that are expressive enough to be practical and yet sufficiently restricted so that equivalence remains decidable.In previous work we introduced NetKAT, a domain-specific language for specifying and verifying network packet-processing functions. NetKAT provides familiar constructs such as tests, assignments, union, sequential composition, and iteration as well as custom primitives for modifying packet headers and encoding network topologies. Semantically, NetKAT is based on Kleene algebra with tests (KAT) and comes equipped with a sound and complete equational theory. Although NetKAT equivalence is decidable, the best known algorithm is hardly practical-it uses Savitch's theorem to determinize a PSPACE algorithm and requires quadratic space. This paper presents a new algorithm for deciding NetKAT equivalence. This algorithm is based on finding bisimulations between finite automata constructed from NetKAT programs. We investigate the coalgebraic theory of NetKAT, generalize the notion of Brzozowski derivatives to NetKAT, develop efficient representations of NetKAT automata in terms of spines and sparse matrices, and discuss the highlights of our prototype implementation.

show abstract

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Wang¹,

Thompson²,

Iyyer³

2021

View full text Add to dashboard Cite

Phrase representations derived from BERT often do not exhibit complex phrasal compositionality, as the model relies instead on lexical similarity to determine semantic relatedness. In this paper, we propose a contrastive fine-tuning objective that enables BERT to produce more powerful phrase embeddings. Our approach (Phrase-BERT) relies on a dataset of diverse phrasal paraphrases, which is automatically generated using a paraphrase generation model, as well as a large-scale dataset of phrases in context mined from the Books3 corpus. Phrase-BERT outperforms baselines across a variety of phrase-level similarity tasks, while also demonstrating increased lexical diversity between nearest neighbors in the vector space. Finally, as a case study, we show that Phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model that interprets topics as mixtures of words and phrases by performing a nearest neighbor search in the embedding space. Crowdsourced evaluations demonstrate that this phrase-based topic model produces more coherent and meaningful topics than baseline word and phrase-level topic models, further validating the utility of Phrase-BERT.

show abstract

Quantifying the Effects of Text Duplication on Semantic Models

Schofield¹,

Thompson²,

Mimno³

2017

View full text Add to dashboard Cite

Duplicate documents are a pervasive problem in text datasets and can have a strong effect on unsupervised models. Methods to remove duplicate texts are typically heuristic or very expensive, so it is vital to know when and why they are needed. We measure the sensitivity of two latent semantic methods to the presence of different levels of document repetition. By artificially creating different forms of duplicate text we confirm several hypotheses about how repeated text impacts models. While a small amount of duplication is tolerable, substantial over-representation of subsets of the text may overwhelm meaningful topical patterns.

show abstract

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Wang

Thompson

Iyyer

2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Laure Thompson

The strange geometry of skip-gram with negative sampling

A Coalgebraic Decision Procedure for NetKAT

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Quantifying the Effects of Text Duplication on Semantic Models

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Contact Info

Product

Resources

About