Novel association measures using web search with double checking

Chen, Hsin-Hsi; Lin, Ming-Shun; Wei, Yu-Chuan

doi:10.3115/1220175.1220302

Cited by 112 publications

(88 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It has applications in many natural language processing tasks, such as Textual Entailment, Word Sense Disambiguation or Information Extraction, and other related areas like Information Retrieval. The techniques used to solve this problem can be roughly classified into two main categories: those relying on pre-existing knowledge resources (thesauri, semantic networks, taxonomies or encyclopedias) (Alvarez and Lim, 2007;Yang and Powers, 2005;Hughes and Ramage, 2007) and those inducing distributional properties of words from corpora (Sahami and Heilman, 2006;Chen et al, 2006;Bollegala et al, 2007).…”

Section: Introductionmentioning

confidence: 99%

A study on similarity and relatedness using distributional and WordNet-based approaches

Agirre

Alfonseca

Hall

et al. 2009

Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com

610

587

View full text Add to dashboard Cite

This paper presents and compares WordNetbased and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.

show abstract

Section: Introductionmentioning

confidence: 99%

A study on similarity and relatedness using distributional and WordNet-based approaches

Agirre

Alfonseca

Hall

et al. 2009

Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com

610

587

View full text Add to dashboard Cite

show abstract

“…We do not normalize the similarity scores to [0, 1] range in our experiments because the evaluation metrics we use are insensitive to linear transformations of similarity scores. Table 1 compares the proposed method against Miller-Charles ratings (MC), and previously proposed web-based semantic similarity measures: Jaccard, Dice, Overlap, PMI (Bollegala et al, 2007), Normalized Google Distance (NGD) (Cilibrasi and Vitanyi, 2007), Sahami and Heilman (SH) (2006), co-occurrence double checking model (CODC) (Chen et al, 2006), and support vector machine-based (SVM) approach (Bollegala et al, 2007). The bottom row of Table 1 shows the Pearson correlation coefficient of similarity scores produced by each algorithm with MC.…”

Section: Computing Semantic Similaritymentioning

confidence: 99%

“…They did not compare their similarity measure with taxonomybased similarity measures. Chen et al, (2006) propose a web-based doublechecking model to compute the semantic similarity between words. For two words X and Y , they collect snippets for each word from a web search engine.…”

Section: Related Workmentioning

confidence: 99%

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

Bollegala

Matsuo

Ishizuka

2009

Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 2 - EMNLP '09

View full text Add to dashboard Cite

Semantic similarity is a central concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Accurate measurement of semantic similarity between words is essential for various tasks such as, document clustering, information retrieval, and synonym extraction. We propose a novel model of semantic similarity using the semantic relations that exist among words. Given two words, first, we represent the semantic relations that hold between those words using automatically extracted lexical pattern clusters. Next, the semantic similarity between the two words is computed using a Mahalanobis distance measure. We compare the proposed similarity measure against previously proposed semantic similarity measures on Miller-Charles benchmark dataset and WordSimilarity-353 collection. The proposed method outperforms all existing web-based semantic similarity measures, achieving a Pearson correlation coefficient of 0.867 on the Millet-Charles dataset.

show abstract

“…PageRank-like techniques have also been used to calculate the similarity. Chen et al [25] have proposed to exploit the text snippets returned by a Web search engine as an important measure in computing the semantic similarity between two words. In their approach, the text snippets for the two words, A and B, are collected and the occurrences of word A are counted in the snippet of word B, and vice versa.…”

Section: Identifying Relationships Between Conceptsmentioning

confidence: 99%

Towards Bridging the Web and the Semantic Web

Kulkarni

Caragea

2009

2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology

View full text Add to dashboard Cite

The World Wide Web (WWW) has provided us with a plethora of information. However, given its unstructured format, this information is useful mainly to humans and cannot be effectively interpreted by machines. The Semantic Web provides information in computer understandable structures (e.g., RDF), but the amount of information on the Semantic Web is limited compared to the amount of information available on the Web. The problem of generating a bridge between the Web and Semantic Web has recently gained a lot of attention. In this paper, we propose a Concept Extractor and Relationship Identifier (CE-RI) system, which acts as a bridge between Web and Semantic Web by providing a "semantic" way of presenting the search results to the user. The Concept Extractor (CE) component of our system makes use of the power of existing search engines coupled with the elegance of PageRank to extract high quality concepts related to the given query. The Relationship Identifier (RI) component finds relationships between the extracted concepts and the given query and presents them to the user in the form of a graph. It also stores the generated results formally, in the form of RDF triples, to facilitate better inferences as compared to traditional search engines. We evaluate our system by comparing its components CE and RI with other similar "state of the art" concept detection and relationship identification systems, respectively. The results produced by our system are either similiar or better than those generated by other systems.

show abstract

Novel association measures using web search with double checking

Cited by 112 publications

References 7 publications

A study on similarity and relatedness using distributional and WordNet-based approaches

A study on similarity and relatedness using distributional and WordNet-based approaches

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

Towards Bridging the Web and the Semantic Web

Contact Info

Product

Resources

About