Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL - ACL '06 2006
DOI: 10.3115/1220175.1220302
|View full text |Cite
|
Sign up to set email alerts
|

Novel association measures using web search with double checking

Abstract: A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as CoOccurrence Double Check (CODC), are presented. In the experiments on Rubenstein-Goodenough's benchmark data set, the CODC measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using WordNet. The experiments on link detection of named entities using the strategies of dir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
86
1
1

Year Published

2009
2009
2020
2020

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 112 publications
(88 citation statements)
references
References 7 publications
0
86
1
1
Order By: Relevance
“…It has applications in many natural language processing tasks, such as Textual Entailment, Word Sense Disambiguation or Information Extraction, and other related areas like Information Retrieval. The techniques used to solve this problem can be roughly classified into two main categories: those relying on pre-existing knowledge resources (thesauri, semantic networks, taxonomies or encyclopedias) (Alvarez and Lim, 2007;Yang and Powers, 2005;Hughes and Ramage, 2007) and those inducing distributional properties of words from corpora (Sahami and Heilman, 2006;Chen et al, 2006;Bollegala et al, 2007).…”
Section: Introductionmentioning
confidence: 99%
“…It has applications in many natural language processing tasks, such as Textual Entailment, Word Sense Disambiguation or Information Extraction, and other related areas like Information Retrieval. The techniques used to solve this problem can be roughly classified into two main categories: those relying on pre-existing knowledge resources (thesauri, semantic networks, taxonomies or encyclopedias) (Alvarez and Lim, 2007;Yang and Powers, 2005;Hughes and Ramage, 2007) and those inducing distributional properties of words from corpora (Sahami and Heilman, 2006;Chen et al, 2006;Bollegala et al, 2007).…”
Section: Introductionmentioning
confidence: 99%
“…We do not normalize the similarity scores to [0, 1] range in our experiments because the evaluation metrics we use are insensitive to linear transformations of similarity scores. Table 1 compares the proposed method against Miller-Charles ratings (MC), and previously proposed web-based semantic similarity measures: Jaccard, Dice, Overlap, PMI (Bollegala et al, 2007), Normalized Google Distance (NGD) (Cilibrasi and Vitanyi, 2007), Sahami and Heilman (SH) (2006), co-occurrence double checking model (CODC) (Chen et al, 2006), and support vector machine-based (SVM) approach (Bollegala et al, 2007). The bottom row of Table 1 shows the Pearson correlation coefficient of similarity scores produced by each algorithm with MC.…”
Section: Computing Semantic Similaritymentioning
confidence: 99%
“…They did not compare their similarity measure with taxonomybased similarity measures. Chen et al, (2006) propose a web-based doublechecking model to compute the semantic similarity between words. For two words X and Y , they collect snippets for each word from a web search engine.…”
Section: Related Workmentioning
confidence: 99%
“…PageRank-like techniques have also been used to calculate the similarity. Chen et al [25] have proposed to exploit the text snippets returned by a Web search engine as an important measure in computing the semantic similarity between two words. In their approach, the text snippets for the two words, A and B, are collected and the occurrences of word A are counted in the snippet of word B, and vice versa.…”
Section: Identifying Relationships Between Conceptsmentioning
confidence: 99%