2014 IEEE International Conference on Semantic Computing 2014
DOI: 10.1109/icsc.2014.22
|View full text |Cite
|
Sign up to set email alerts
|

Creating a Phrase Similarity Graph from Wikipedia

Abstract: The paper addresses the problem of modeling the relationship between phrases in English using a similarity graph. The mathematical model stores data about the strength of the re lationship between phrases expressed as a decimal number. Both structured data from Wikipedia, such as that the Wikipedia page with title "Dog" belongs to the Wikipedia category "Domesticated animals", and textual descriptions, such as that the Wikipedia page with title "Dog" contains the word "wolf" thirty one times are used in creati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…The reason why our system has higher value for the MAP measure than the Apache Lucene system is because we find not only documents that contain words from the input query, but also documents that contain words and phrases that are semantically similar to those in the input query. In comparison with our previous work ( [39]), the presented system performs better because our old algorithm considers only disjoint paths between the input query and each of the documents and it does not take into account the complex interweaving network of edges that can exist in the probabilistic graph.…”
Section: Introductionmentioning
confidence: 89%
See 3 more Smart Citations
“…The reason why our system has higher value for the MAP measure than the Apache Lucene system is because we find not only documents that contain words from the input query, but also documents that contain words and phrases that are semantically similar to those in the input query. In comparison with our previous work ( [39]), the presented system performs better because our old algorithm considers only disjoint paths between the input query and each of the documents and it does not take into account the complex interweaving network of edges that can exist in the probabilistic graph.…”
Section: Introductionmentioning
confidence: 89%
“…Second, let us quickly examine the algorithm from [39]. Given a document d and a query q, the scoring function is defined as shown in Equations 4 and 5.…”
Section: P Pmentioning
confidence: 99%
See 2 more Smart Citations
“…We also showed that the ¯ne-tuned algorithm improves these results even further and also gives us improved results on the entropy measure as compare to the cosine similarity algorithm. One area for future research is using an extended version of the similarity graph that contains information from Wikipedia [37] to perform document clustering. One challenge in this area is that the extended graph is relatively big (more than 10 GB) and computing the distance between documents can be computationally expensive.…”
Section: Conclusion and Future Researchmentioning
confidence: 99%