Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

Can, Fazlı; Ozkarahan, Esen A.

doi:10.1145/99935.99938

Cited by 104 publications

(102 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Originally, Yao's formula determines the number of disk pages to be accessed to retrieve the related records of a query under the assumption that database records are randomly distributed among the same size pages. Later Can and Ozkarahan [6] adapted the formula for environments for pages (clusters) with different sizes. For using Yao's formula in our problem we treat the individual clusters of C s as queries and determine how their members (like the related documents of a query) are distributed in the clustering structure C t .…”

Section: The Methodsmentioning

confidence: 99%

“…We refer to the entity n t as the Translation Relationship Index (TRI) and check the merit of the index by comparing it with the value of n tr . The existence of n tr , which can be directly computed by the modified Yao's formula [6], gives TRI the attribute of a measurement criterion, since n tr provides a benchmark or a reference point. If the observed TRI value indicates that the relationship is different from random (i.e., if n t is smaller than n tr ), we obtain the baseline distribution for n tr using the Monte Carlo approach to decide if the difference is significant.…”

Section: The Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Translation Relationship Quantification: A Cluster-Based Approach and its Application to Shakespeare’s Sonnets

Can

Karbeyaz

2010

Lecture Notes in Electrical Engineering

Self Cite

View full text Add to dashboard Cite

Abstract. We introduce a method for quantifying translation relationship between source and target texts. In this method, we partition source and target texts into corresponding blocks and cluster them separately using word phrases extracted by a suffix tree approach. We quantify the translation relationship by examining the similarity between source and target clustering structures. In this comparison we aim to observe that their similarity is meaningful, i.e., it is significantly different from random. The method is based on the hypothesis that similarities and dissimilarities among the source blocks will not be lost in translation and reappear among target blocks. For testing we use Shakespeare's sonnets and its translation in Turkish. The results show that our method successfully quantifies translation relationships.

show abstract

Section: The Methodsmentioning

confidence: 99%

Section: The Methodsmentioning

confidence: 99%

Translation Relationship Quantification: A Cluster-Based Approach and its Application to Shakespeare’s Sonnets

Can

Karbeyaz

2010

Lecture Notes in Electrical Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is a seed oriented, partitioning, singlepass, linear-time clustering algorithm introduced in [3]. The main goal of C 3 M is to convey the relationships among documents using a two-stage probability experiment.…”

Section: Clusteringmentioning

confidence: 99%

“…If none of the seeds covers the non-seed document, then, it is directly added to the Others cluster. Detailed information about C 3 M can be found in [3]. Modified sequential k-means algorithm.…”

Section: Clusteringmentioning

confidence: 99%

A New Approach to Search Result Clustering and Labeling

Turel¹,

Can²

2011

Information Retrieval Technology

Self Cite

View full text Add to dashboard Cite

Abstract. Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In this paper, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient-based and sequential k-means clustering algorithms. A cluster labeling method based on term weighting is also introduced for reflecting cluster contents. In addition, we present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative strategy to derive the relative performance of the proposed method with respect to two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Experimental results in the publicly available AMBIENT and ODP-239 datasets show that our method can successfully achieve both clustering and labeling tasks.

show abstract

“…More specifically, we use a partitioning type clustering algorithm, so-called Cover-Coefficient Based Clustering Methodology (C 3 M) [7], along with some index pruning techniques for clustering XML documents.…”

Section: Introductionmentioning

confidence: 99%

Exploiting Index Pruning Methods for Clustering XML Collections

Altıngövde

Atilgan

Ulusoy

2010

Focused Retrieval and Evaluation

View full text Add to dashboard Cite

Abstract. In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C 3 M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.

show abstract

Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases

Cited by 104 publications

References 21 publications

Translation Relationship Quantification: A Cluster-Based Approach and its Application to Shakespeare’s Sonnets

Translation Relationship Quantification: A Cluster-Based Approach and its Application to Shakespeare’s Sonnets

A New Approach to Search Result Clustering and Labeling

Exploiting Index Pruning Methods for Clustering XML Collections

Contact Info

Product

Resources

About