Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1048
|View full text |Cite
|
Sign up to set email alerts
|

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources

Abstract: Word vector specialisation (also known as retrofitting) is a portable, light-weight approach to fine-tuning arbitrary distributional word vector spaces by injecting external knowledge from rich lexical resources such as WordNet. By design, these post-processing methods only update the vectors of words occurring in external lexicons, leaving the representations of all unseen words intact. In this paper, we show that constraint-driven vector space specialisation can be extended to unseen words. We propose a nove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(21 citation statements)
references
References 57 publications
0
21
0
Order By: Relevance
“…Our approach utilizes synonym sets in UMLS to learn name representations, while also enforces the learned representation to be similar to their contextual and conceptual representations. The idea is related to word vector specialization (retrofitting) (Faruqui et al, 2015;Mrkšić et al, 2017;Vulić et al, 2018). The difference is that we focus on learning representation for multi-word concept names, hence the contextual and conceptual constraints are essential, in addition to the synonymous similarity.…”
Section: Average Of Contextual Word Embeddingsmentioning
confidence: 99%
“…Our approach utilizes synonym sets in UMLS to learn name representations, while also enforces the learned representation to be similar to their contextual and conceptual representations. The idea is related to word vector specialization (retrofitting) (Faruqui et al, 2015;Mrkšić et al, 2017;Vulić et al, 2018). The difference is that we focus on learning representation for multi-word concept names, hence the contextual and conceptual constraints are essential, in addition to the synonymous similarity.…”
Section: Average Of Contextual Word Embeddingsmentioning
confidence: 99%
“…The techniques for creating these word vectors follow the distributional hypothesis [11] by capturing distributional regularities [4], so that the distributional semantic and syntactic similarities encoded in the word vectors represent the properties that arise from multiple co-occurrences in a large training corpus. As a result, these representations tend to treat similarity very broadly, e.g., conflating synonyms and antonyms [12]. Moreover, the generality or specialization of the representations is a reflection of the data used for training the representation.…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…In our case, the aim is not to use the word embeddings directly to solve a task, but rather to employ them for encoding inputs for the input layer of an LSTM classifier. Such post-processing techniques can also be used to improve performance in down-stream tasks that use word embeddings [12]. The general idea is also an attractive one from an applied perspective: if a light-weight post-processing technique, injecting knowledge from a lexical or linguistic resource, can improve general word embeddings for a domain-specific task, then that would help solve some of the problems related to data scarcity in domain-specific applications [6,7].…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…Faruqui et al (2015) retrofit embeddings with an efficient iterative updating method to reduce the distances between synonyms derived from WordNet. Vulić et al (2018) and Glavaš and Vulić (2018) propose to learn specialization functions of seen words in semantic lexicons and propagate it to unseen words. Much research work (Mrkšić et al 2016;Glavaš and Vulić 2018) utilizes antonyms to further differentiate the dissimilar words in addition to pulling the representation of synonyms words close.…”
Section: Word Representation Specializationmentioning
confidence: 99%
“…Taking English WordNet as an example, it only contains 155K words organized in 176K synsets, which is rather small compared to the large vocabulary size on the training data. Vulić et al (2018) and Glavaš and Vulić (2018) partially solve this problem by first designing a mapping function that learns the specialization process for seen words, and then applying the learned function to unseen words in semantic lexicons. Unfortunately, their approaches still depend on the linguistic constraints derived from manually created resources.…”
Section: Introductionmentioning
confidence: 99%