Learning Generic Sentence Representations Using Convolutional Neural Networks

Gan, Zhe; Pu, Yunchen; Henao, Ricardo; Li, Chunyuan; He, Xiaodong; Carin, Lawrence

doi:10.18653/v1/d17-1254

Cited by 78 publications

(61 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that the latest available skip-thoughts implementation returns an error on the IMDB task. 2,4,5,6 (Arora et al, 2018a;Hill et al, 2016;Gan et al, 2017;Logeswaran and Lee, 2018) Best results from publication. Table 4: Performance of document embeddings built usingà la carte n-gram vectors and recent unsupervised word-level approaches on classification tasks, with the character LSTM of (Radford et al, 2017) shown for comparison.…”

Section: N-gram Embeddings For Classificationmentioning

confidence: 99%

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Khodak¹,

Saunshi²,

Liang³

et al. 2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

110

View full text Add to dashboard Cite

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introducesà la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable "on the fly" in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how theà la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

show abstract

Section: N-gram Embeddings For Classificationmentioning

confidence: 99%

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Khodak¹,

Saunshi²,

Liang³

et al. 2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

110

View full text Add to dashboard Cite

show abstract

“…Recently, much effort has also been directed towards learning representations for larger pieces of text, with methods ranging from clever compositions of word embeddings (Mitchell and Lapata, 2008;De Boom et al, 2016;Arora et al, 2017;Wieting et al, 2016;Wieting and Gimpel, 2018;Zhelezniak et al, 2019) to sophisticated neural architectures (Le and Mikolov, 2014;Kiros et al, 2015;Conneau et al, 2017;Gan et al, 2017;Tang et al, 2017;Zhelezniak et al, 2018;Subramanian et al, 2018;Pagliardini et al, 2018;Cer et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Correlation Coefficients and Semantic Textual Similarity

Железняк¹,

Savkov

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. We thoroughly characterise cases where Pearson correlation (and thus cosine similarity) is unfit as similarity measure. Importantly, we show that Pearson correlation is appropriate for some word vectors but not others. When it is not appropriate, we illustrate how common nonparametric rank correlation coefficients can be used instead to significantly improve performance. We support our analysis with a series of evaluations on word-level and sentencelevel semantic textual similarity benchmarks. On the latter, we show that even the simplest averaged word vectors compared by rank correlation easily rival the strongest deep representations compared by cosine similarity.

show abstract

“…Unsupervised combined models. The results of the individual models Gan et al, 2017) are not promising. To get better performance, they train two separate models on the same corpus and then combine the latent representations together.…”

Section: Evaluation Resultsmentioning

confidence: 94%

Learning Universal Sentence Representations with Mean-Max Attention Autoencoder

Zhang

Li³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In order to learn universal sentence representations, previous methods focus on complex recurrent neural networks or supervised learning. In this paper, we propose a meanmax attention autoencoder (mean-max AAE) within the encoder-decoder framework. Our autoencoder rely entirely on the MultiHead self-attention mechanism to reconstruct the input sequence. In the encoding we propose a mean-max strategy that applies both mean and max pooling operations over the hidden vectors to capture diverse information of the input. To enable the information to steer the reconstruction process dynamically, the decoder performs attention over the mean-max representation. By training our model on a large collection of unlabelled data, we obtain highquality representations of sentences. Experimental results on a broad range of 10 transfer tasks demonstrate that our model outperforms the state-of-the-art unsupervised single methods, including the classical skip-thoughts and the advanced skip-thoughts+LN model (Ba et al., 2016). Furthermore, compared with the traditional recurrent neural network, our mean-max AAE greatly reduce the training time. 1

show abstract

Learning Generic Sentence Representations Using Convolutional Neural Networks

Cited by 78 publications

References 43 publications

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Correlation Coefficients and Semantic Textual Similarity

Learning Universal Sentence Representations with Mean-Max Attention Autoencoder

Contact Info

Product

Resources

About