2010
DOI: 10.1016/j.ins.2010.02.021
|View full text |Cite
|
Sign up to set email alerts
|

Pairwise-adaptive dissimilarity measure for document clustering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(12 citation statements)
references
References 33 publications
0
12
0
Order By: Relevance
“…However, the function quickly approaches an asymptote, limiting the impact of a single term. Although document retrieval and clustering are not identical tasks, there is now enough clustering research to suggest BM25 might aid in document clustering (Bashier and Rauber 2009;de Vries and Geva 2008;Whissell et al 2009;D'hondt et al 2010;Kutty et al, 2010). This, coupled with the fact that no thorough analysis on the specific benefits of Diff is the improvement in using the best binary algorithm over the best tf algorithm BM25 in document clustering exists, led us to use BM25 in a clustering experiment similar to our initial experiment discussed Sect.…”
Section: Bm25 Based Feature Weightingmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the function quickly approaches an asymptote, limiting the impact of a single term. Although document retrieval and clustering are not identical tasks, there is now enough clustering research to suggest BM25 might aid in document clustering (Bashier and Rauber 2009;de Vries and Geva 2008;Whissell et al 2009;D'hondt et al 2010;Kutty et al, 2010). This, coupled with the fact that no thorough analysis on the specific benefits of Diff is the improvement in using the best binary algorithm over the best tf algorithm BM25 in document clustering exists, led us to use BM25 in a clustering experiment similar to our initial experiment discussed Sect.…”
Section: Bm25 Based Feature Weightingmentioning
confidence: 99%
“…A novel contribution of this paper is our investigation of Okapi BM25 (BM25) feature weighting. Only recently has BM25 been seriously considered in document clustering (de Vries and Geva 2008;Bashier and Rauber 2009;Whissell et al 2009;D'hondt et al 2010;Kutty et al 2010); with works that do use BM25 still being a small minority. Bashier and Rauber (2009) investigate relevance feedback using clustering.…”
Section: Introductionmentioning
confidence: 99%
“…A term or feature can be a single word, multiple words, a phrase 1 or other indexing units [9,10]. The weight of a term represents the importance of it in the relevant document and is assigned by a term weighting scheme [11]. Term frequency (tf ) [3], inverse document frequency (idf ) [12], or multiplication of tf and idf (tf-idf ) [13][14][15] are commonly used term weighting schemes.…”
Section: Page 2 Of 23mentioning
confidence: 99%
“…For instance, Euclidean distance is a geometric measure used to measure the distance between two vectors [18,19]. Cosine similarity compares two documents with respect to the angle between their vectors [11]. Similar to two previous measures, Manhattan distance is also a geometric measure [20,21].…”
Section: Page 2 Of 23mentioning
confidence: 99%
“…Pairwise-adaptive similarity dynamically select number of features prior to every similarity measurement. Based on this method a relevant subset of terms is selected that will contribute to the measured distance between both related vectors [30].…”
Section: Related Workmentioning
confidence: 99%