PQk-means

Matsui, Yusuke; Ogaki, Keisuke; Yamasaki, Toshihiko; Aizawa, Kiyoharu

doi:10.1145/3123266.3123430

Cited by 18 publications

(5 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• PQK-means [36]: Based on symmetric distance, PQK-means assigns each sample to the nearest cluster center. It then iteratively updates the cluster centers and sample labels.…”

Section: Compared Methodsmentioning

confidence: 99%

“…Similarly, Shen et al [35] introduced auxiliary binary representation and designed an objective function based on Hamming distance. Matsui et al [36] compressed input vectors into short product quantization codes, achieving fast and efficient clustering. These methods significantly reduce data storage space through quantization, reduce clustering computation and processing time, and provide effective solutions for handling large-scale datasets.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Large-Scale Stream K-means Based on Product-Quantized Codes

Hang,

Yin,

et al. 2024

Preprint

View full text Add to dashboard Cite

In recent years, using clustering technology to process large-scale data streams is a research hotspot in the field of data mining. However, for the processing of large-scale data streams, most existing methods suffer from slow speed, insufficient memory, and lack of detection and response mechanisms for concept drift. In this paper, a Large-Scale Stream K-measn based on product-quantized codes (LS2K-means) is proposed. By first introducing product quantization code into the framework of incremental clustering methods, memory space consumption is reduced through the dimensionality reduction of data. Additionally, a new similarity measurement method is introduced, greatly improving the efficiency of distance calculation. A concept drift detection and response mechanism is constructed. By comparing the consistency of clustering results, concept drift can be quickly detected, and a backtracking mechanism is utilized to respond to concept drift promptly, effectively improving the algorithm’s performance. The effectiveness of the proposed method is validated through simulations on six real datasets. The method efficiently handles concept drift and outperforms DenStream and EmCStream in terms of execution efficiency.

show abstract

“…• PQK-means [36]: Based on symmetric distance, PQK-means assigns each sample to the nearest cluster center. It then iteratively updates the cluster centers and sample labels.…”

Section: Compared Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Large-Scale Stream K-means Based on Product-Quantized Codes

Hang,

Yin,

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…More precisely, an intermediate code is introduced to compress the original vector dataset to a lower dimension and short-length dataset. Based on this concept, a billion-scale memory-efficient clustering algorithm has been developed, called the PQk-means algorithm (Matsui et al, 2017).…”

Section: Vector Quantisation: Pqk-means Algorithmmentioning

confidence: 99%

“…where 𝑁 𝑘 = 𝑁 𝑖=1 𝐼 (𝐶 (𝑖) = 𝑘) is the number of vectors of the 𝑘th cluster, and 𝑑 (c 𝑖 , c 𝑘 is the distance between the 𝑖th code vector and 𝑘th code vector. With this algorithm, clustering a one billion dataset with 100,000 clusters is achieved in 14 h on a 32GB RAM personal computer (Matsui et al, 2017). Unlike other existing large-scale clustering methods, such as Bk-means (Gong et al, 2015) and IQkmeans (Avrithis et al, 2015), the original vector sample can also be reconstructed approximately from a given code, this being particularly efficient for large AIS trajectory data.…”

Section: Vector Quantisation: Pqk-means Algorithmmentioning

confidence: 99%

“…However, the bottleneck issue we have to deal with is the large amount of incoming AIS data. As it has been shown that the k-means vector quantisation approach does not perform well in the setting of large datasets, we use a high-performance PQk-means algorithm recently introduced (Matsui et al, 2017). Overall, a combination of topic model and extended vector quantisation is applied to then extract specific navigation patterns for a given maritime region of interest.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Navigation pattern extraction from AIS trajectory big data via topic model

Fujino

Claramunt

2023

J. Navigation

View full text Add to dashboard Cite

This paper introduces a novel approach for extracting vessel navigation patterns from very large automatic identification system (AIS) trajectory big data. AIS trajectory data records are first converted to a series of code documents using vector quantisation, such as k-means and PQk-means algorithms, whose performance is evaluated in terms of precision and computational time. Therefore, a topic model is applied to these code documents from which vessels’ navigation patterns are extracted and identified. The potential of the proposed approach is illustrated by several experiments conducted with a practical AIS dataset in a region of North West France. These experimental results show that the proposed approach is highly appropriate for mining AIS trajectory big data and outperforms common DBSCAN algorithms and Gaussian mixture models.

show abstract