Lubomir Stanchev scite author profile

Lubomir Stanchev

5Publications

74Citation Statements Received

56Citation Statements Given

How they've been cited

How they cite others

Affiliations

California Polytechnic State University, Indiana University – Purdue University Fort Wayne, Cal Poly Corporation

Publications

Order By: Most citations

Creating a Similarity Graph from WordNet

Stanchev

2014

View full text Add to dashboard Cite

Semantic Document Clustering Using a Similarity Graph

Stanchev

2016

View full text Add to dashboard Cite

Document clustering addresses the problem of identifying groups of similar documents without human supervision. Unlike most existing solutions that perform document clustering based on keywords matching, we propose an algorithm that considers the meaning of the terms in the documents. For example, a document that contains the words "dog" and "cat" multiple times may be placed in the same category as a document that contains the word "pet" even if the two documents share only noise words in common. Our semantic clustering algorithm is based on a similarity graph that stores the degree of semantic relationship between terms (extracted from WordNet), where a term can be a word or a phrase. We experimentally validate our algorithm on the Reuters-21578 benchmark, which contains 11, 362 newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We show that the second approach produces higher precision and recall, which means that this approach matches closer the results of the human study.

show abstract

Semantic Document Clustering Using Information from WordNet and DBPedia

Stanchev

2018

View full text Add to dashboard Cite

Creating a Phrase Similarity Graph from Wikipedia

Stanchev

2014

View full text Add to dashboard Cite

The paper addresses the problem of modeling the relationship between phrases in English using a similarity graph. The mathematical model stores data about the strength of the re lationship between phrases expressed as a decimal number. Both structured data from Wikipedia, such as that the Wikipedia page with title "Dog" belongs to the Wikipedia category "Domesticated animals", and textual descriptions, such as that the Wikipedia page with title "Dog" contains the word "wolf" thirty one times are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of phrases using our software that uses the graph with results of studies that were performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research.

show abstract

Measuring the Strength of the Semantic Relationship Between Words

Stanchev

2015

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

We propose a novel way for extracting the strength of the semantic relationship between words from semi-structured sources, such as WordNet. Unlike existing approaches that only explore the structured information (e.g., the hypernym relationship in WordNet), we present a framework that allows us to utilize all available information, including natural text descriptions. Our approach constructs a similarity graph that stores the strength of the semantic relationship between words. Specifically, an edge between two words describes the probability that someone who is interested in resources about the first word will be also interested in resources about the second word. Note that the graph is asymmetric because the probability that someone is interested in the second word given that they are interested int the first word is not the same as the probability that they are interested in the first word given that they are interested in the second word. The similarity between any two words in the graph can be computed as a function of the directed paths between the two nodes in the graph that represent the words.We evaluate the quality of the data in the similarity graph by comparing the similarity of pairs of words using our software that uses the graph with results of studies that are performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research. We also present an extended evaluation section that describes how the different heuristics that we use affect the correlation score.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.