2009
DOI: 10.1007/978-3-642-01307-2_62
|View full text |Cite
|
Sign up to set email alerts
|

Clustering Documents Using a Wikipedia-Based Concept Representation

Abstract: Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0
2

Year Published

2010
2010
2018
2018

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 66 publications
(27 citation statements)
references
References 12 publications
0
25
0
2
Order By: Relevance
“…Huang, Milne, Frank, & Witten [10] mapped candidate phrases in the given document to Wikipedia articles by leveraging an informative and compact vocabulary -the collection of anchor texts in Wikipedia. The existing adopted method is more similar with Huang et al [10] used where Wikipedia's anchor text vocabulary is used to connect terms to Wikipedia articles.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Huang, Milne, Frank, & Witten [10] mapped candidate phrases in the given document to Wikipedia articles by leveraging an informative and compact vocabulary -the collection of anchor texts in Wikipedia. The existing adopted method is more similar with Huang et al [10] used where Wikipedia's anchor text vocabulary is used to connect terms to Wikipedia articles.…”
Section: Related Workmentioning
confidence: 99%
“…The existing adopted method is more similar with Huang et al [10] used where Wikipedia's anchor text vocabulary is used to connect terms to Wikipedia articles. In this way the number of concepts in a document is no more than the number of terms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A second group of algorithms leverages the semantic representation to reduce data dimensionality [19]. Finally, other methods define new similarity measures that take into account the semantic relations between concepts [9,8,21]. However, operate solely in the semantic space is not always the best choice for document clustering: even though the same concept can be expressed by different terms, sometimes each term is specific to a particular domain or language register.…”
Section: Introductionmentioning
confidence: 99%
“…Concept similarity based had been proposed for text classification [81] and text clustering [35,38].…”
Section: Short Text Clusteringmentioning
confidence: 99%