Dependency-Based Word Embeddings

Levy, Omer; Goldberg, Yoav

doi:10.3115/v1/p14-2050

Cited by 883 publications

(931 citation statements)

References 19 publications

Supporting

Mentioning

905

Contrasting

Unclassified

Order By: Relevance

“…The most accurate system is WECE bow , which supports the assertion by Levy and Goldberg (2014a) that bag-of-word embeddings should offer superior performance to dependencybased embeddings on task involving semantic relations. Carrying out an error analysis, the lowest results of the WECE systems are obtained in the domains with the fewest training instances, making apparent that word embedding systems are dependent on the number of training instances.…”

Section: Domain-aware Training Instancessupporting

confidence: 75%

“…Moreover, for a fair comparison with the GraCE system, developed with dependency relations, we also tested the results obtained with a dependency-based Skip-gram model (Levy and Goldberg, 2014a). Words occurring only once in corpus are filtered out and 200-dimensional vectors are learned.…”

Section: Word Embeddings Representationsmentioning

confidence: 99%

“…This approach can easily be extended to the classification setting: Given a target word pair (x, y), the similarity is computed between (x, y) and each word pair (x, y) i of a target relation r. The average of these similarity measurements was taken as the final score for each relation r. 4 Finally, the word pair is classified as an instance of the relation with the highest associated score. Two types of embeddings are used, (a) the word embeddings produced using the method of Mikolov et al (2011), which was originally used in Zhila et al (2013) and (b) the embeddings using the method of Levy and Goldberg (2014a), which include dependency parsing information. We refer to these as DS Zhila and DS Levy , respectively.…”

Section: Ds Zhila and Ds Levymentioning

confidence: 99%

See 2 more Smart Citations

Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2015

View full text Add to dashboard Cite

will take place in Denver on June 4 and 5 and is colocated with SemEval and NAACL. As in 2014 at COLING, also on this occasion *SEM and SemEval chose to coordinate their programs by featuring a joint invited talk. In this way, *SEM aims to bring together the ACL SIGLEX and ACL SIGSEM communities.The acceptance rate of *SEM 2015 was quite competitive: out of 98 submissions, we accepted 36 papers for an overall acceptance of 37%. The acceptance rate of long paper that were accepted for oral presentation (18 out of 62) is 29%. The papers cover a wide range of topics including distributional semantics; lexical semantics and lexical acquisition; formal and linguistic semantics; discourse semantics; lexical resources, linked data and ontologies; semantics for applications; and extra-propositional semantics: sentiment and figurative meaning.The *SEM 2015 program consists of oral presentations for selected long papers and a poster session for long and short papers.Day One, June 4th:• Joint *SEM SemEval keynote talk by Marco Baroni;• Oral presentation sessions on distributional semantics, lexical semantics, and extra-propositional semantics;• Poster session.Day Two, June 5th:• Keynote talk by Preslav Natkov;• Oral presentation sessions on semantics for applications, lexical resources and ontologies, formal semantics, and discourse semantics;• *SEM Best Paper Award.We cannot finish without saying that *SEM 2015 would not have been possible without the considerable efforts of our area chairs, their reviewers, and the computational semantics community in general.We hope you will enjoy *SEM 2015, Distributional semantic methods have some a priori appeal as models of human meaning acquisition, because they induce word representations from contextual distributions naturally occurring in corpus data without need for supervision. However, learning the meaning of a (concrete) word also involves establishing a link between the word and its typical visual referents, which is beyond the scope of classic, text-based distributional semantics. Since recently several proposals have been put forward about how to induce multimodal word representations from linguistic and visual contexts, it is natural to ask if this line of work, besides its practical implications, can help us to develop more realistic, grounded models of human word learning within the distributional semantics framework.In my talk, I will report about two studies in which we used multimodal distributional semantics (MDS) to simulate human word learning. In one study, we first measured the ability of subjects to link a nonce word to relevant linguistic and visual associates when prompted only by exposure to minimal corpus evidence about it. We then simulated the same task with an MDS model, finding its behavior remarkably similar to that of subjects. In the second study, we constructed a corpus in which child-directed speech is aligned with real-life pictures of the objects mentioned by care-givers. We then trained our MDS model on these data, and inspected the generaliza...

show abstract

Section: Domain-aware Training Instancessupporting

confidence: 75%

Section: Word Embeddings Representationsmentioning

confidence: 99%

Section: Ds Zhila and Ds Levymentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2015

View full text Add to dashboard Cite

show abstract

“…Murphy et al (2012) represented words through their co-occurrence with other words in syntactic dependency relations, and then used the Non-Negative Sparse Embedding (NNSE) method to reduce the dimension of the resulted representation. Levy and Goldberg (2014) extended the skip-gram word2vec model with negative sampling (Mikolov et al, 2013b) by basing the word co-occurrence window on the dependency parse tree of the sentence. Bollegala et al (2015) replaced bag-of-words contexts with various patterns (lexical, POS and dependency).…”

Section: Related Workmentioning

confidence: 99%

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

Schwartz¹,

Reichart²,

Rappoport³

2015

Proceedings of the Nineteenth Conference on Computational Natural Language Learning

100

View full text Add to dashboard Cite

We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., "X and Y") from a large corpus of plain text, and generate vectors where each coordinate represents the cooccurrence in SPs of the represented word with another word of the vocabulary. Our representation has three advantages over existing alternatives: First, being based on symmetric word relationships, it is highly suitable for word similarity prediction. Particularly, on the SimLex999 word similarity dataset, our model achieves a Spearman's ρ score of 0.517, compared to 0.462 of the state-of-the-art word2vec model. Interestingly, our model performs exceptionally well on verbs, outperforming stateof-the-art baselines by 20.2-41.5%. Second, pattern features can be adapted to the needs of a target NLP application. For example, we show that we can easily control whether the embeddings derived from SPs deem antonym pairs (e.g. (big,small)) as similar or dissimilar, an important distinction for tasks such as word classification and sentiment analysis. Finally, we show that a simple combination of the word similarity scores generated by our method and by word2vec results in a superior predictive power over that of each individual model, scoring as high as 0.563 in Spearman's ρ on SimLex999. This emphasizes the differences between the signals captured by each of the models.

show abstract

“…Traditional representation learning methods aim to capture semantic and syntactic similarities between two words [52]. A graph-based learning method is used for retrofitting word embedding by utilizing semantic lexicons [53].…”

Section: Related Workmentioning

confidence: 99%

Contradiction Detection with Contradiction-Specific Word Embedding

Qin

Liu

2017

Algorithms

View full text Add to dashboard Cite

Contradiction detection is a task to recognize contradiction relations between a pair of sentences. Despite the effectiveness of traditional context-based word embedding learning algorithms in many natural language processing tasks, such algorithms are not powerful enough for contradiction detection. Contrasting words such as "overfull" and "empty" are mostly mapped into close vectors in such embedding space. To solve this problem, we develop a tailored neural network to learn contradiction-specific word embedding (CWE). The method can separate antonyms in the opposite ends of a spectrum. CWE is learned from a training corpus which is automatically generated from the paraphrase database, and is naturally applied as features to carry out contradiction detection in SemEval 2014 benchmark dataset. Experimental results show that CWE outperforms traditional context-based word embedding in contradiction detection. The proposed model for contradiction detection performs comparably with the top-performing system in accuracy of three-category classification and enhances the accuracy from 75.97% to 82.08% in the contradiction category.

show abstract

Dependency-Based Word Embeddings

Cited by 883 publications

References 19 publications

Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

Contradiction Detection with Contradiction-Specific Word Embedding

Contact Info

Product

Resources

About