Proceedings of the 25th ACM International Conference on Multimedia 2017
DOI: 10.1145/3123266.3123430
|View full text |Cite
|
Sign up to set email alerts
|

PQk-means

Abstract: Data clustering is a fundamental operation in data analysis. For handling large-scale data, the standard k-means clustering method is not only slow, but also memory-ine cient. We propose an efcient clustering method for billion-scale feature vectors, called PQk-means. By rst compressing input vectors into short productquantized (PQ) codes, PQk-means achieves fast and memory-e cient clustering, even for high-dimensional vectors. Similar to k-means, PQk-means repeats the assignment and update steps, both of whic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…• PQK-means [36]: Based on symmetric distance, PQK-means assigns each sample to the nearest cluster center. It then iteratively updates the cluster centers and sample labels.…”
Section: Compared Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…• PQK-means [36]: Based on symmetric distance, PQK-means assigns each sample to the nearest cluster center. It then iteratively updates the cluster centers and sample labels.…”
Section: Compared Methodsmentioning
confidence: 99%
“…Similarly, Shen et al [35] introduced auxiliary binary representation and designed an objective function based on Hamming distance. Matsui et al [36] compressed input vectors into short product quantization codes, achieving fast and efficient clustering. These methods significantly reduce data storage space through quantization, reduce clustering computation and processing time, and provide effective solutions for handling large-scale datasets.…”
Section: Related Workmentioning
confidence: 99%
“…More precisely, an intermediate code is introduced to compress the original vector dataset to a lower dimension and short-length dataset. Based on this concept, a billion-scale memory-efficient clustering algorithm has been developed, called the PQk-means algorithm (Matsui et al, 2017).…”
Section: Vector Quantisation: Pqk-means Algorithmmentioning
confidence: 99%
“…where 𝑁 𝑘 = 𝑁 𝑖=1 𝐼 (𝐶 (𝑖) = 𝑘) is the number of vectors of the 𝑘th cluster, and 𝑑 (c 𝑖 , c 𝑘 is the distance between the 𝑖th code vector and 𝑘th code vector. With this algorithm, clustering a one billion dataset with 100,000 clusters is achieved in 14 h on a 32GB RAM personal computer (Matsui et al, 2017). Unlike other existing large-scale clustering methods, such as Bk-means (Gong et al, 2015) and IQkmeans (Avrithis et al, 2015), the original vector sample can also be reconstructed approximately from a given code, this being particularly efficient for large AIS trajectory data.…”
Section: Vector Quantisation: Pqk-means Algorithmmentioning
confidence: 99%
See 1 more Smart Citation