A survey of kernel and spectral methods for clustering

Filippone, Maurizio; Camastra, Francesco; Masulli, Francesco; Rovetta, Stefano

doi:10.1016/j.patcog.2007.05.018

Cited by 737 publications

(388 citation statements)

References 68 publications

Supporting

Mentioning

380

Contrasting

Unclassified

Order By: Relevance

“…As briefly addressed in Section 2.4, it involves an understanding of the mixing time (see, e.g., [17]) of the random walk defined in (8) for specific types of graphs. In particular, for a given data set, the performance of the developed method relies on the parameter α determining the diffusion distance in (9). Computational experimentation with test data sets reveals that the optimal choice of α tends to be robust for a broad variety of data set geometries.…”

Section: Discussionmentioning

confidence: 99%

“…The matrix P can be thought of as a transition matrix whose rows all sum to 1, and whose entry P i,j corresponds to the probability of jumping from the node (data point) i to the node j in one time step. The j-th component of the vector P α e, which is used in (9), is the probability of a random walk ending up in the j-th node, j = 1, 2, . .…”

Section: Geometric and Graph Interpretation Of Diffuzzymentioning

confidence: 99%

“…Further examples are shown in the Supplementary Material. A classical example where the K-means algorithm fails (Filippone et al [9]) is shown in Fig. 2(a).…”

Section: Synthetic Test Data Setsmentioning

confidence: 99%

See 2 more Smart Citations

DifFUZZY: a fuzzy clustering algorithm for complex datasets

Cominetti

Matzavinos

Samarasinghe

et al. 2010

IJCIBSB

View full text Add to dashboard Cite

Soft (fuzzy) clustering techniques are often used in the study of high-dimensional data sets, such as microarray and other high-throughput bioinformatics data. The most widely used method is the Fuzzy C-means algorithm (FCM), but it can present difficulties when dealing with some data sets. A fuzzy clustering algorithm, DifFUZZY, which utilises concepts from diffusion processes in graphs and is applicable to a larger class of clustering problems than other fuzzy clustering algorithms is developed. Examples of data sets (synthetic and real) for which this method outperforms other frequently used algorithms are presented, including two benchmark biological data sets, a genetic expression data set and a data set that contains taxonomic measurements. This method is better than traditional fuzzy clustering algorithms at handling data sets that are "curved", elongated or those which contain clusters of different dispersion. The algorithm has been implemented in Matlab and C++ and is available at http://www.maths.ox.ac.uk/cmb/difFUZZY.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Geometric and Graph Interpretation Of Diffuzzymentioning

confidence: 99%

See 1 more Smart Citation

DifFUZZY: a fuzzy clustering algorithm for complex datasets

Cominetti

Matzavinos

Samarasinghe

et al. 2010

IJCIBSB

View full text Add to dashboard Cite

show abstract

“…For example, two main methods are defined as partitioning and hierarchical clustering, where an optimization rule applied to define clusters for the former type and a recursive approach which results in dendograms is introduced for the latter [7]. K-means clustering defines closeness as the metric for similarity to group data sets into clusters.…”

Section: B Nuclei Clustersmentioning

confidence: 99%

“…Structure of clusters is quantified by a variety of methods reported in literature [7], [8], [9]. Typically, the validity of clusters is evaluated by either the dispersion of data each cluster contains, or the data separation between clusters, or both [8].…”

Section: Introductionmentioning

confidence: 99%