Response to Comment on "Clustering by Passing Messages Between Data Points"

Frey, Brendan J.; Dueck, Delbert

doi:10.1126/science.1151268

Cited by 53 publications

(27 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most significant advantage of SAP is that it is better than k-means in the foregoing evaluations while k-means runs 200 times (the best run is used to compare with SAP) and costs about twenty-fold of SAP in time. This result also confirms the one in [7], [30], and [31]. Furthermore, even after 10000 runs of k-means -with a size of 400 documents (F-measure: 0.406; Entropy: 0.677), we can't get similar results as SAP.…”

Section: General Comparisonsupporting

confidence: 78%

“…Many detailed analysis of the AP approach have been carried out (see for instance [30] and [31]) for various datasets with different scales. These studies show that for small datasets, there are only minor differences between traditional strategies (such as p-median model and vertex substitution heuristic) and Affinity Propagation clustering for both precision and CPU execution time.…”

Section: Related Workmentioning

confidence: 99%

“…These studies show that for small datasets, there are only minor differences between traditional strategies (such as p-median model and vertex substitution heuristic) and Affinity Propagation clustering for both precision and CPU execution time. Nevertheless, for large datasets, AP offers obvious advantages over existing methods [7,31]. In particular, in their work Frey and Dueck showed that an improvement in execution time of roughly 100 times is achieved on datasets of more than 10000 objects and ca.…”

Section: Related Workmentioning

confidence: 99%

“…500 clusters. Moreover, in [7][8][9][10][11][12] and [30][31][32][33], it has been identified that the similarity measurement has a great influence on AP clustering.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Text Clustering with Seeds Affinity Propagation

Guan

Shi

Marchese

et al. 2011

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

-Based on an effective clustering algorithm -Affinity Propagation (AP) -we present in this paper a novel semi-supervised text-clustering algorithm, called Seeds Affinity Propagation (SAP). There are two main contributions in our approach: (1) a new similarity metric that captures the structural information of texts; (2) a novel seed construction method to improve the semi-supervised clustering process. To study the performance of the new algorithm, we applied it to the benchmark data set Reuters-21578, and compared it to two state-of-the-art clustering algorithms, namely k-means algorithm and the original AP algorithm. Furthermore, we have analyzed the individual impact of the two proposed contributions. Results show that the proposed similarity metric is more effective in text clustering (F-measures ca. 21% higher than in the AP algorithm) and that the proposed semi-supervised strategy achieves both better clustering results and faster convergence (using only 76% iterations of the original AP). The complete SAP algorithm obtains higher F-measure (ca. 40% improvement over k-means and AP) and lower entropy (ca. 28 % decrease over k-means and AP), improves significantly clustering execution time (twenty time faster) in respect than k-means, and provides enhanced robustness compared with all other methods.

show abstract

Section: General Comparisonsupporting

confidence: 78%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…500 clusters. Moreover, in [7][8][9][10][11][12] and [30][31][32][33], it has been identified that the similarity measurement has a great influence on AP clustering.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Text Clustering with Seeds Affinity Propagation

Guan

Shi

Marchese

et al. 2011

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

show abstract

“…To combat the second challenge, Frey et al [8] proposed a clustering technique called affinity propagation clustering (APC), which propagates affinity message between samples to search a high-quality set of clusters [9]. APC has been shown its usefulness in image segmentation [10,11], gene expressions [12] and text summarization [13].…”

Section: Introductionmentioning

confidence: 99%

Affinity Propagation Clustering Using Path Based Similarity

Jiang

Liao

2016

Algorithms

View full text Add to dashboard Cite

Clustering is a fundamental task in data mining. Affinity propagation clustering (APC) is an effective and efficient clustering technique that has been applied in various domains. APC iteratively propagates information between affinity samples, updates the responsibility matrix and availability matrix, and employs these matrices to choose cluster centers (or exemplars) of respective clusters. However, since it mainly uses negative Euclidean distance between exemplars and samples as the similarity between them, it is difficult to identify clusters with complex structure. Therefore, the performance of APC deteriorates on samples distributed with complex structure. To mitigate this problem, we propose an improved APC based on a path-based similarity (APC-PS). APC-PS firstly utilizes negative Euclidean distance to find exemplars of clusters. Then, it employs the path-based similarity to measure the similarity between exemplars and samples, and to explore the underlying structure of clusters. Next, it assigns non-exemplar samples to their respective clusters via that similarity. Our empirical study on synthetic and UCI datasets shows that the proposed APC-PS significantly outperforms original APC and other related approaches.

show abstract

Finding exemplars in dense data with affinity propagation on clusters of GPUs

Kurdziel

Boryczko

2012

Concurrency and Computation

View full text Add to dashboard Cite

This work presents an efficient implementation of affinity propagation (AP) on clusters of graphical processing units (GPUs). AP is a state-of-the-art method for finding exemplars in data sets described by similarity matrices. It is typically employed in crisp clustering applications. However, when finding exemplars in an n-pattern data set with dense, non-metric similarities, AP performs iterative processing of three n n floating point matrices. One of them stores the similarities, and the other two store the values that will ultimately pinpoint the exemplars. For large similarity matrices, AP is therefore computationally expensive. Although matrix operations of AP are well suited for GPUs, its memory footprint limits the size of tasks that can be solved on one unit. We present, however, a decomposition scheme for AP that distributes the calculations over multiple GPUs, with low communication-to-computation ratio. Because of this favorable communication pattern, our implementation finds exemplars in large, dense similarity data efficiently, even when GPUs are connected by a slow network. Furthermore, by combining global device memory of multiple GPUs, it can solve problems that would not fit in a single unit.

show abstract

Response to Comment on "Clustering by Passing Messages Between Data Points"

Cited by 53 publications

References 9 publications

Text Clustering with Seeds Affinity Propagation

Text Clustering with Seeds Affinity Propagation

Affinity Propagation Clustering Using Path Based Similarity

Finding exemplars in dense data with affinity propagation on clusters of GPUs

Contact Info

Product

Resources

About