Proceedings of the 17th International Conference on Computational Linguistics - 1998
DOI: 10.3115/980432.980696
|View full text |Cite
|
Sign up to set email alerts
|

Automatic retrieval and clustering of similar words

Abstract: Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurns is significantly closer to WordNet than Roget Thesaurus is.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
806
0
10

Year Published

2001
2001
2014
2014

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 643 publications
(822 citation statements)
references
References 7 publications
(11 reference statements)
6
806
0
10
Order By: Relevance
“…The methods implemented in the WordNet::Similarity software package (Pedersen et al 2004) determine how close two words are in WordNet. These methods are J&C (Jiang and Conrath 1997), Res (Resnik 1995), Lin (Lin 1998a), W&P (Wu and Palmer 1994), L&C (Leacock and Chodorow 1998), H&SO (Hirst and St-Onge 1998), Path (counts edges between synsets), Lesk (Banerjee and Pedersen 2002), and finally Vector and Vector Pair (Patwardhan et al 2003). The measure most similar to the edgeScore method is the Path measure in WordNet.…”
Section: Parse Wikipedia With Minipar (Lin 1998amentioning
confidence: 99%
See 2 more Smart Citations
“…The methods implemented in the WordNet::Similarity software package (Pedersen et al 2004) determine how close two words are in WordNet. These methods are J&C (Jiang and Conrath 1997), Res (Resnik 1995), Lin (Lin 1998a), W&P (Wu and Palmer 1994), L&C (Leacock and Chodorow 1998), H&SO (Hirst and St-Onge 1998), Path (counts edges between synsets), Lesk (Banerjee and Pedersen 2002), and finally Vector and Vector Pair (Patwardhan et al 2003). The measure most similar to the edgeScore method is the Path measure in WordNet.…”
Section: Parse Wikipedia With Minipar (Lin 1998amentioning
confidence: 99%
“…We used Wikipedia 6 as a source of data and parsed it with MINI-PAR (Lin 1998a). The choice of dependency triples instead of all neighbouring words favours contexts which most directly affect a word's meaning.…”
Section: Building a Word-context Matrix For Semantic Relatednessmentioning
confidence: 99%
See 1 more Smart Citation
“…the phenomenon that errors in previous iterations have a deteriorating effect on the accuracy of later iterations McIntosh and Curran (2009). To dampen this effect, distributional similarity (Lin, 1998;van der Plas, 2008) was used to filter instance pairs where the first element is not distributionally similar to the group of soccer players or where the second element in not similar to soccer clubs. The results for this method are given in the final two columns.…”
Section: Capital-of # Patterns # Pairs (P) 1st Ans Ok Mrrmentioning
confidence: 99%
“…We used three sources to automatically expand the Seed Lexicon: WordNet [3], Lin's distributional thesaurus [4], and a pivot-based paraphrase generation tool [5]. The resulting lexicons will be called Raw WN, Raw Lin, and Raw Para, respectively; they were created as follows.…”
Section: Automatically Expanding the Seed Lexiconmentioning
confidence: 99%