2017
DOI: 10.1177/1748301817735665
|View full text |Cite
|
Sign up to set email alerts
|

A MapReduce-based improvement algorithm for DBSCAN

Abstract: This paper proposes an improved adaptive density-based spatial clustering of applications with noise (DBSCAN) algorithm based on genetic algorithm and MapReduce parallel computing programming framework to improve the poor clustering effect and low efficiency of the DBSCAN algorithm, which due to experiential solving parameters. The size of Intensive Interval Threshold minPts and Scan Radius Eps would be rational planned by genetic algorithm iterative optimization, and it is secondary statute processing with th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…It also aids computational biologists who are testing and benchmarking new clustering algorithms, evaluation metrics and pre-or post-processing steps [10]. Future iterations of hypercluster could include further cutting-edge clustering techniques, including those designed for larger data sets [31,32] or account for multiple types of data [48]. Hypercluster streamlines comparative unsupervised clustering, allowing the prioritization of both convenience and rigor.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It also aids computational biologists who are testing and benchmarking new clustering algorithms, evaluation metrics and pre-or post-processing steps [10]. Future iterations of hypercluster could include further cutting-edge clustering techniques, including those designed for larger data sets [31,32] or account for multiple types of data [48]. Hypercluster streamlines comparative unsupervised clustering, allowing the prioritization of both convenience and rigor.…”
Section: Discussionmentioning
confidence: 99%
“…Typically, the effect of hyperparameter choice on the quality of clustering results cannot be described with a convex function, meaning that hyperparameters should be chosen through exhaustive grid search [ 29 ], a slow and cumbersome process. Software packages for automatic hyperparameter tuning and model selection for regression and classification exist, notably auto-sklearn from AutoML [ 30 ], and some groups have made excellent tools for distributing a single clustering calculation for huge datasets [ 31 , 32 ], but to the best of our knowledge, there is no package for comparing several clustering algorithms and hyperparameters.…”
Section: Introductionmentioning
confidence: 99%
“…Gotz et al [43] present HPDBSCAN, an algorithm for both shared-memory and distributed-memory based on partitioning the data among processors, running DBSCAN locally on each partition, and then merging the clusters together. Exact and approximate distributed DBSCAN algorithms have been designed using the MapReduce [7,34,39,51,53,63,90,92] and Spark [32,49,54,68,69,82] paradigms. RP-DBSCAN [82], which is an approximate DBSCAN algorithm, has been shown to be the state-of-the-art for MapReduce and Spark.…”
Section: Related Workmentioning
confidence: 99%
“…The expression levels of a gene across multiple experimental settings are referred to as a gene expression profile, whereas the expression levels of all genes in a sample are referred to as a sample expression profile. Researchers can evaluate the expression levels of a large number of genes in a variety of samples and settings by using microarrays [3]. The information gathered from them is referred to as gene expression data.…”
Section: Introductionmentioning
confidence: 99%