Fast Communication-Efficient Spectral Clustering over Distributed Data

Yan, Donghui; Wang, Yingjie; Wang, Jin; Wu, Guodong

doi:10.1109/tbdata.2019.2907985

Cited by 6 publications

(2 citation statements)

References 61 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another line of closely related work are those under the term "learning over inherently distributed data" [45,46]. Instead of dividing the data, these work deal with situations where the data are already distributed, i.e., stored at a number of distributed machines as a result of business operation or diverse data collection channels.…”

Section: Related Workmentioning

confidence: 99%

“…We will use recursive random projections [13,44] to produce a compressed signature for each partition. The idea of recursive random projections has been successfully applied in fast approximate spectral clustering [43], computing over distributed data [45,46], and other procedures. Since our approach is a divide-and-conquer method with a representation compression, we refer to it as divide-compress-and-conquer, or DC 2 in short.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DC²: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

Wang

Bian

Liu

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random sampling which may be suboptimal. To address these concerns, we propose the DC 2 algorithm. Instead of ensemble instances, we produce structure-preserving signature pieces to be assembled and conquered. DC 2 achieves the efficiency of sampling-based large scale kernel methods while enabling parallel multicore or clustered computation. The data partition and subsequent compression are unified by recursive random projections. Empirically dividing the data by random projections induces smaller mean squared approximation errors than conventional random sampling. The power of DC 2 is demonstrated by our clustering algorithm rpfCluster + , which is as accurate as some fastest approximate spectral clustering algorithms while maintaining a running time close to that of K-means clustering. Analysis on DC 2 when applied to spectral clustering shows that the loss in clustering accuracy due to data division and reduction is upper bounded by the data approximation error which would vanish with recursive random projections. Due to its easy implementation and flexibility, we expect DC 2 to be applicable to general large scale learning problems.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

DC²: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

Wang

Bian

Liu

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

show abstract

A Novel Hybrid Clustering Analysis Based on Combination of K-Means and PSO Algorithm

Krishna

Devarakonda

Al-Shamri

et al. 2022

Data Intelligence and Cognitive Informatics

View full text Add to dashboard Cite

Cost-sensitive selection of variables by ensemble of model sequences

Yan

Qin²,

Gu³

et al. 2021

Knowl Inf Syst

View full text Add to dashboard Cite

Fast Communication-Efficient Spectral Clustering over Distributed Data

Cited by 6 publications

References 61 publications

DC²: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

DC²: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

A Novel Hybrid Clustering Analysis Based on Combination of K-Means and PSO Algorithm

Cost-sensitive selection of variables by ensemble of model sequences

Contact Info

Product

Resources

About

Fast Communication-Efficient Spectral Clustering over Distributed Data

Cited by 6 publications

References 61 publications

DC2: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

DC2: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

A Novel Hybrid Clustering Analysis Based on Combination of K-Means and PSO Algorithm

Cost-sensitive selection of variables by ensemble of model sequences

Contact Info

Product

Resources

About

DC²: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

DC²: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering