A fast kernel-based multilevel algorithm for graph clustering

Dhillon, Inderjit S.; Guan, Yuqiang; Kulis, Brian

doi:10.1145/1081870.1081948

Cited by 93 publications

(84 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Graclus (GRAC) [8]: This is a graph clustering algorithm that divides the nodes of a given weighted graph into clusters such that the sum of weights of the inter-cluster edges is minimized. We ran it on a new derived graph having the same set of nodes as G, while edges exist between any two nodes with a non-zero cost reduction and the weight on the edge equal to the cost reduction s(·).…”

Section: Methodsmentioning

confidence: 99%

“…In contrast, we use information-theoretic metrics to group nodes such that the graph representation is as compact as possible. Another problem with many of the widely-used clustering algorithms, such as METIS [17], Graclus [8], kmeans and spectral clustering [24], is that they require the user to specify the number of partitions beforehand, which is typically hard to estimate and not required in our setting.…”

Section: Related Workmentioning

confidence: 99%

“…Thus, it is amenable to visualization and other graph analysis techniques (e.g., finding communities, customer segments); specifically, it provides insight into the highlevel structure of the graph, and the dominant relationships among the various node clusters. Unlike clustering algorithms [17,8] that group nodes based on their similarity or distances, our summary is computed using information-theoretic principles.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Graph summarization with bounded error

Navlakha

Rastogi

Shrivastava³

2008

Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data

287

349

View full text Add to dashboard Cite

We propose a highly compact two-part representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edge-corrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarse-level summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple reallife graph data sets.To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Graph summarization with bounded error

Navlakha

Rastogi

Shrivastava³

2008

Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data

287

349

View full text Add to dashboard Cite

show abstract

“…Cannot-link constraints (like charges) increase the weights of affected edge. The situation is illustrated in Figure 1, where must-link constraints are {e (2,3), e (6,7), e (14,15)} and the cannot-link constraints are {e(9,10)}.…”

Section: Magnetically Affected Paths (Map)mentioning

confidence: 99%

Applying Electromagnetic Field Theory Concepts to Clustering with Constraints

Hakkoymaz

Chatzimilioudis

Gunopulos

et al. 2009

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. This work shows how concepts from the electromagnetic field theory can be efficiently used in clustering with constraints. The proposed framework transforms vector data into a fully connected graph, or just works straight on the given graph data. User constraints are represented by electromagnetic fields that affect the weight of the graph's edges. A clustering algorithm is then applied on the adjusted graph, using k-distinct shortest paths as the distance measure. Our framework provides better accuracy compared to MPCK-Means, SS-KernelKMeans and Kmeans+Diagonal Metric even when very few constraints are used, significantly improves clustering performance on some datasets that other methods fail to partition successfully, and can cluster both vector and graph datasets. All these advantages are demonstrated through thorough experimental evaluation.

show abstract

“…These algorithms use the eigenvectors of the affinity matrix that encodes the local data structure to make a global decision of clustering. They demonstrate promising performance in the domains of image segmentation [95], document clustering [35], data mining [112], dimensionality reduction [16], and semi-supervised learning [15] [116].…”

Section: Spectral Graph Partitioningmentioning

confidence: 99%

Kernel-based clustering and low rank approximation

Zhang¹

View full text Add to dashboard Cite

Authorization Page ii Signature Page iiiAcknowledgments iv Table of Contents v List of Figures viii List of Tables ix Abstract xChapter 1 Introduction 1 3.5 Embedding of the digits 0 and 1 obtained by KPCA (using the three leading eigenvectors) and our method (using only 6 representatives). 643.6 Comparison of clustering performance for spectral clustering and the proposed method on one digit classification task (MNIST digits 6 and 8). ABSTRACTClustering is an unsupervised data exploration scenario that is of fundamental importance to pattern recognition and machine learning. This thesis involves two types of clustering paradigms, the mixture models and graph-based clustering methods, with the primary focus on how to improve the scaling behavior of related algorithms for large-scale application. With regard to mixture models, we are interested in reducing the model complexity in terms of number of components. We propose a unified algorithm to simultaneously solve "model simplification" and "component clustering", and apply it with success in a number of learning algorithms using mixture models, such as density based clustering and SVM testing.For graph-based clustering, we propose the density weighted Nyström method for solving large scale eigenvalue problems, which demonstrates encouraging performance in the normalized-cut and kernel principal component analysis. We further extend this to the low rank approximation of kernel matrices, which is the key component to scaling up the kernel machines. We provide an error analysis on the Nyström low rank approximation, based on which a new sampling scheme is proposed. Our scheme is very efficient and numerically outperforms a number of state-of-the-art approaches such as incomplete Cholesky decomposition, the standard Nyström method, and probabilistic sampling approaches.

show abstract

A fast kernel-based multilevel algorithm for graph clustering

Cited by 93 publications

References 11 publications

Graph summarization with bounded error

Graph summarization with bounded error

Applying Electromagnetic Field Theory Concepts to Clustering with Constraints

Kernel-based clustering and low rank approximation

Contact Info

Product

Resources

About