Aleksandr Drozd scite author profile

Following up on numerous reports of analogybased identification of "linguistic regularities" in word embeddings, this study applies the widely used vector offset method to 4 types of linguistic relations: inflectional and derivational morphology, and lexicographic and encyclopedic semantics. We present a balanced test set with 99,200 questions in 40 categories, and we systematically examine how accuracy for different categories is affected by window size and dimensionality of the SVD-based word embeddings. We also show that GloVe and SVD yield similar patterns of results for different categories, offering further evidence for conceptual similarity between count-based and neural-net based models.

show abstract

The (too Many) Problems of Analogical Reasoning with Word Vectors

Rogers¹,

Drozd²,

Li³

2017

View full text Add to dashboard Cite

This paper explores the possibilities of analogical reasoning with vector space models. Given two pairs of words with the same relation (e.g. man:woman :: king:queen), it was proposed that the offset between one pair of the corresponding word vectors can be used to identify the unknown member of the other pair ( −−→ king − −−→ man + − −−−− → woman = ? −−−→ queen). We argue against such "linguistic regularities" as a model for linguistic relations in vector space models and as a benchmark, and we show that the vector offset (as well as two other, better-performing methods) suffers from dependence on vector similarity.

show abstract

Intrinsic Evaluations of Word Embeddings: What Can We Do Better?

Gladkova¹,

Drozd²

2016

View full text Add to dashboard Cite

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Af-fordance detection is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affor-dance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

show abstract

Subword-level Composition Functions for Learning Word Embeddings

Li¹,

Drozd²,

Liu³

et al. 2018

View full text Add to dashboard Cite

Subword-level information is crucial for capturing the meaning and morphology of words, especially for out-of-vocabulary entries. We propose CNN-and RNN-based subword-level composition functions for learning word embeddings, and systematically compare them with popular word-level and subword-level models (Skip-Gram and FastText). Additionally, we propose a hybrid training scheme in which a pure subword-level model is trained jointly with a conventional word-level embedding model based on lookup-tables. This increases the fitness of all types of subwordlevel word embeddings; the word-level embeddings can be discarded after training, leaving only compact subword-level representation with much smaller data volume. We evaluate these embeddings on a set of intrinsic and extrinsic tasks, showing that subwordlevel models have advantage on tasks related to morphology and datasets with high OOV rate, and can be combined with other types of embeddings.

show abstract

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings

Liu

Zhao

et al. 2017

View full text Add to dashboard Cite

The number of word embedding models is growing every year. Most of them are based on the co-occurrence information of words and their contexts. However, it is still an open question what is the best definition of context. We provide a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings. Comprehensive experiments are conducted to evaluate their effectiveness on 6 extrinsic and intrinsic tasks. We hope that this paper, along with the published code, would be helpful for choosing the best context type and representation for a given task.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aleksandr Drozd

Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't.

The (too Many) Problems of Analogical Reasoning with Word Vectors

Intrinsic Evaluations of Word Embeddings: What Can We Do Better?

Subword-level Composition Functions for Learning Word Embeddings

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings

Contact Info

Product

Resources

About