Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics 2014
DOI: 10.3115/v1/e14-1057
|View full text |Cite
|
Sign up to set email alerts
|

What Substitutes Tell Us - Analysis of an "All-Words" Lexical Substitution Corpus

Abstract: We present the first large-scale English "allwords lexical substitution" corpus. The size of the corpus provides a rich resource for investigations into word meaning. We investigate the nature of lexical substitute sets, comparing them to WordNet synsets. We find them to be consistent with, but more fine-grained than, synsets. We also identify significant differences to results for paraphrase ranking in context reported for the SEMEVAL lexical substitution data. This highlights the influence of corpus construc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
103
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 86 publications
(107 citation statements)
references
References 19 publications
4
103
0
Order By: Relevance
“…9 All (max, no co-clustering) 0.250* 0.695* 0.690* Choose-K: # WN Synsets (max, co-clustering) 0.241** 0.690** 0.683** Choose-K: Optimize NMI (avg) 0.282 0.668 0.662 Choose-K: Optimize NMI (max, no co-clustering) 0.331* 0.719 * 0.714 *** Choose K: Optimize NMI (max, co-clustering) 0.314** 0.718 **** 0.710 ** baseline and cluster sense inventories are capable of improving these GAP scores when we use the best sense as a filter. Syntactic models generally give very good results with small paraphrase sets (Kremer et al, 2014) but their performance seems to degrade when they need to deal with larger and noisier substitute sets (Apidianaki, 2016). Our results suggest that finding the most appropriate sense of a target word in context can improve their lexical substitution results.…”
Section: Resultsmentioning
confidence: 62%
See 1 more Smart Citation
“…9 All (max, no co-clustering) 0.250* 0.695* 0.690* Choose-K: # WN Synsets (max, co-clustering) 0.241** 0.690** 0.683** Choose-K: Optimize NMI (avg) 0.282 0.668 0.662 Choose-K: Optimize NMI (max, no co-clustering) 0.331* 0.719 * 0.714 *** Choose K: Optimize NMI (max, co-clustering) 0.314** 0.718 **** 0.710 ** baseline and cluster sense inventories are capable of improving these GAP scores when we use the best sense as a filter. Syntactic models generally give very good results with small paraphrase sets (Kremer et al, 2014) but their performance seems to degrade when they need to deal with larger and noisier substitute sets (Apidianaki, 2016). Our results suggest that finding the most appropriate sense of a target word in context can improve their lexical substitution results.…”
Section: Resultsmentioning
confidence: 62%
“…The first is the "Concepts in Context" (CoInCo) corpus (Kremer et al, 2014), containing over 15K sentences corresponding to nearly 4K unique target words. We divide the CoInCo dataset into development and test sets by first finding all target words that have at least 10 sentences.…”
Section: Datasetsmentioning
confidence: 99%
“…However, as also reported in Kremer et al (2014), the performance gain achieved by taking the given context into consideration is smaller than in LS07. Again, this seems to be due to the nature of LS14, which is not biased to ambiguous target words.…”
Section: Resultsmentioning
confidence: 73%
“…A more recent dataset (Kremer et al, 2014), denoted LS14, provides the same kind of data as LS07, but instead of target words that were specifically selected to be ambiguous as in LS07, the target words here are simply all the content words in text documents extracted from news and fiction corpora. LS14 is also much larger than LS07 with over 15K target word instances.…”
Section: Lexical Substitution Datasetsmentioning
confidence: 99%
“…Lexical substitution systems perform substitute ranking in context using vector-space models (Thater et al, 2011;Kremer et al, 2014;Melamud et al, 2015). Recently, Apidianaki (2016) showed that a syntax-based substitution model can successfully filter the paraphrases available in the Paraphrase Database (PPDB) (Ganitkevitch et al, 2013) to select the ones that are adequate in specific contexts.…”
Section: Related Workmentioning
confidence: 99%