2009
DOI: 10.1007/978-3-642-03761-0_43
|View full text |Cite
|
Sign up to set email alerts
|

Document Clustering with K-tree

Abstract: This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comme… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 13 publications
0
8
0
Order By: Relevance
“…A novel contribution of this paper is our investigation of Okapi BM25 (BM25) feature weighting. Only recently has BM25 been seriously considered in document clustering (de Vries and Geva 2008;Bashier and Rauber 2009;Whissell et al 2009;D'hondt et al 2010;Kutty et al 2010); with works that do use BM25 still being a small minority. Bashier and Rauber (2009) investigate relevance feedback using clustering.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A novel contribution of this paper is our investigation of Okapi BM25 (BM25) feature weighting. Only recently has BM25 been seriously considered in document clustering (de Vries and Geva 2008;Bashier and Rauber 2009;Whissell et al 2009;D'hondt et al 2010;Kutty et al 2010); with works that do use BM25 still being a small minority. Bashier and Rauber (2009) investigate relevance feedback using clustering.…”
Section: Introductionmentioning
confidence: 99%
“…Bashier and Rauber (2009) investigate relevance feedback using clustering. de Vries and Geva (2008) use BM25 weighting when clustering XML documents, but offer no comparison to tf-idf weighting using their clustering method. Kutty et al (2010) also cluster XML documents using BM25 weighting, with the authors showing an improvement over tf-idf.…”
Section: Introductionmentioning
confidence: 99%
“…The standard measures of Purity, Entropy, NMI and F1 are used to determine the quality of clusters with regard to the categories. Negentropy [5] is also used. It measures the same system property as Entropy but it is normalized and inverted so scores fall between 0 and 1 where 0 is the worst and 1 is the best.…”
Section: Clustering Evaluation Measuresmentioning
confidence: 99%
“…Micro-purity of the clustering solution ω is obtained as a weighted sum of individual cluster purity. Macro-purity is the unweighted arithmetic mean based on the total number of categories [5].…”
Section: Clustering Evaluation Measuresmentioning
confidence: 99%
“…In another work, Zhang et al [13] make use of the hyperlink structure between XML documents through an extension of a machine learning method based on the Self Organizing Maps for graphs. De Vries et al [9] use K-Trees to cluster XML documents so that they can obtain clusters in good quality with a low complexity method. Lastly, Tran et al [12] construct a latent semantic kernel to measure the similarity between content of the XML documents.…”
Section: Introductionmentioning
confidence: 99%