Proceedings of the 20th International Conference Companion on World Wide Web 2011
DOI: 10.1145/1963192.1963249
|View full text |Cite
|
Sign up to set email alerts
|

Comparative study of clustering techniques for short text documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0
1

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 59 publications
(35 citation statements)
references
References 4 publications
0
34
0
1
Order By: Relevance
“…They compare DPMFP with four other clustering models: EM text classification (EM-TC) [25], K-means [16], LDA [3] and exponential-family approximation of the Dirichlet compound multinomial distribution (EDCM) [8]; they find that DPMFP performs best. In the context of short text documents, Rangrej et al [27] compare three clustering algorithms including Kmeans, Singular Value Decomposition and Affinity Propagation [9] on a small set of tweets and find that Affinity Propagation outperforms the other two, but the complexity of Affinity Propagation is quadratic in the number of documents. Tsur et al [33], Yin [38], and Yu et al [40] focus on the problem of online clustering of a stream of tweets.…”
Section: User Clustering and Text Clusteringmentioning
confidence: 99%
“…They compare DPMFP with four other clustering models: EM text classification (EM-TC) [25], K-means [16], LDA [3] and exponential-family approximation of the Dirichlet compound multinomial distribution (EDCM) [8]; they find that DPMFP performs best. In the context of short text documents, Rangrej et al [27] compare three clustering algorithms including Kmeans, Singular Value Decomposition and Affinity Propagation [9] on a small set of tweets and find that Affinity Propagation outperforms the other two, but the complexity of Affinity Propagation is quadratic in the number of documents. Tsur et al [33], Yin [38], and Yu et al [40] focus on the problem of online clustering of a stream of tweets.…”
Section: User Clustering and Text Clusteringmentioning
confidence: 99%
“…He proposed and described a method to determine the most appropriate topic model fortweet clustering. Rangrej et al (2011) compared various document clustering techniques including k-means, SVD-based method and a graphbased approach and compared their performance on short text data collected from Twitter. Tweet Motif that clusters Twitter messages by frequent significant terms was presented by O'Connor et al (2010).…”
Section: Related Workmentioning
confidence: 99%
“…Since English is the most commonly used language in Twitter (Honey and Herring, 2009), the focus is on tweets written in English. Twitter provides a large quantity of short text in the form of tweets where each tweet represents a single document (Rangrej et al, 2011). Goyal (2011) stated that tweet similarity between two users is defined as "the cosine similarity between the documents formed by combining the tweets of a user into one".…”
Section: The Problemmentioning
confidence: 99%
“…Therefore, TDM [14] has both advantages of EDM and CDM which are scale invariant while allowing discrimination collinear vectors. In accordance with performance of distance measure, TDM [14,15,16,17] is one of best distance measure for clustering text. Because of several reasons stated before, it is important to understand the behavior of TDM in distributed environment especially in DKM and DFCM.…”
Section: Introductionmentioning
confidence: 99%