K-DBSCAN: An improved DBSCAN algorithm for big data

Gholizadeh, Nahid; Saadatfar, Hamid; Hanafi, Nooshin

doi:10.1007/s11227-020-03524-3

Cited by 57 publications

(13 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Density-based spatial clustering of applications with noise (DBSCAN) is an unsupervised machine learning clustering algorithm [18].There are two important parameters in the DBSCAN algorithm: Eps( ) and M inP ts, the former being the neighborhood radius when defining the density and the latter being the threshold value when defining the core point [19].…”

Section: Dbscan Algorithmmentioning

confidence: 99%

A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

Teng

Xian

Lu³

et al. 2023

IEEE Access

View full text Add to dashboard Cite

To protect data privacy, users prefer to store encrypted data in cloud servers. Cloud servers reduce the cost of storage and network bandwidth by eliminating duplicate copies. To address the potential internal data leakage problem, the concept of clustering deviation is proposed for the first time. We improve the DBSCAN algorithm to tolerate clustering deviation. A data deduplication scheme is built upon the new algorithm, which considers users as clustering samples. Instead of immediately re-clustering new users, a certain deviation is tolerated to assign the users to the existing classes. We determine the popularity of the data according to user clustering results and apply different encryption schemes to protect the security of unpopular data more effectively. The performance of the algorithm is analyzed and compared with other methods through experiments, and the results verify the feasibility and efficiency of the proposed deduplication scheme.

show abstract

Section: Dbscan Algorithmmentioning

confidence: 99%

A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

Teng

Xian

Lu³

et al. 2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In order to measure the clustering results of the improved method, we use Accuracy, Davies-Bouldin index (DBI), Silhouette index (Sil), Rand index (RI) [41,42], Normalized Mutual Information (NMI), Homogeneity, Completeness, and V-measure [43].…”

Section: The Error Indexmentioning

confidence: 99%

An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

Yang

Chen

et al. 2022

J Supercomput

View full text Add to dashboard Cite

As unsupervised learning algorithm, clustering algorithm is widely used in data processing field. Density-based spatial clustering of applications with noise algorithm (DBSCAN), as a common unsupervised learning algorithm, can achieve clusters via finding high-density areas separated by low-density areas based on cluster density. Different from other clustering methods, DBSCAN can work well for any shape clusters in the spatial database and can effectively cluster exceptional data. However, in the employment of DBSCAN, the parameters, EPS and MinPts, need to be preset for different clustering object, which greatly influences the performance of the DBSCAN. To achieve automatic optimization of parameters and improve the performance of DBSCAN, we proposed an improved DBSCAN optimized by arithmetic optimization algorithm (AOA) with opposition-based learning (OBL) named OBLAOA-DBSCAN. In details, the reverse search capability of OBL is added to AOA for obtaining proper parameters for DBSCAN, to achieve adaptive parameter optimization. In addition, our proposed OBLAOA optimizer is compared with standard AOA and several latest meta heuristic algorithms based on 8 benchmark functions from CEC2021, which validates the exploration improvement of OBL. To validate the clustering performance of the OBLAOA-DBSCAN, 5 classical clustering methods with 10 real datasets are chosen as the compare models according to the computational cost and accuracy. Based on the experimental results, we can obtain two conclusions: (1) the proposed OBLAOA-DBSCAN can provide highly accurately clusters more efficiently; and (2) the OBLAOA can significantly improve the exploration ability, which can provide better optimal parameters.

show abstract

“…These high-density data are divided into different clusters. Suitable clusters are selected as references for K-value selection [ 41 ].…”

Section: Description Of the Positioning Algorithmmentioning

confidence: 99%

Improved CNN-Based Indoor Localization by Using RGB Images and DBSCAN Algorithm

Cheng

Niu

Zhang

et al. 2022

Sensors

View full text Add to dashboard Cite

With the intense deployment of wireless systems and the widespread use of intelligent equipment, the requirement for indoor positioning services is increasing, and Wi-Fi fingerprinting has emerged as the most often used approach to identifying indoor target users. The construction time of the Wi-Fi received signal strength (RSS) fingerprint database is short, but the positioning performance is unstable and susceptible to noise. Meanwhile, to strengthen indoor positioning precision, a fingerprints algorithm based on a convolution neural network (CNN) is often used. However, the number of reference points participating in the location estimation has a great influence on the positioning accuracy. There is no standard for the number of reference points involved in position estimation by traditional methods. For the above problems, the grayscale images corresponding to RSS and angle of arrival are fused into RGB images to improve stability. This paper presents a position estimation method based on the density-based spatial clustering of applications with noise (DBSCAN) algorithm, which can select appropriate reference points according to the situation. DBSCAN analyses the CNN output and can choose the number of reference points based on the situation. Finally, the position is approximated using the weighted k-nearest neighbors. The results show that the calculation error of our proposed method is at least 0.1–0.3 m less than that of the traditional method.

show abstract

K-DBSCAN: An improved DBSCAN algorithm for big data

Cited by 57 publications

References 30 publications

A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

A Data Deduplication Scheme Based on DBSCAN With Tolerable Clustering Deviation

An efficient DBSCAN optimized by arithmetic optimization algorithm with opposition-based learning

Improved CNN-Based Indoor Localization by Using RGB Images and DBSCAN Algorithm

Contact Info

Product

Resources

About