2018
DOI: 10.1145/3132088
|View full text |Cite
|
Sign up to set email alerts
|

Systematic Review of Clustering High-Dimensional and Large Datasets

Abstract: Technological advancement has enabled us to store and process huge amount of data in relatively short spans of time. The nature of data is rapidly changing, particularly its dimensionality is more commonly multi- and high-dimensional. There is an immediate need to expand our focus to include analysis of high-dimensional and large datasets. Data analysis is becoming a mammoth task, due to incremental increase in data volume and complexity in terms of heterogony of data. It is due to this dynamic computing envir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(30 citation statements)
references
References 133 publications
0
30
0
Order By: Relevance
“…SubCLU was proposed by Kailing et al in 2004, and it is one of the best subspace clustering algorithms up to now [10]. Its subspace search direction is bottom-up, which starts from one-dimensional subspace and gradually extends to multidimensional subspace, and finds clusters based on density in all subspaces.…”
Section: B Subclumentioning
confidence: 99%
See 1 more Smart Citation
“…SubCLU was proposed by Kailing et al in 2004, and it is one of the best subspace clustering algorithms up to now [10]. Its subspace search direction is bottom-up, which starts from one-dimensional subspace and gradually extends to multidimensional subspace, and finds clusters based on density in all subspaces.…”
Section: B Subclumentioning
confidence: 99%
“…Researchers have proposed many clustering algorithms in recent decades [5][6][7][8][9], and these algorithms can be roughly divided into five categories [10]: (1) K-means [11], kmedoids [12] and other algorithms based on partition; (2) BIRCH [13], CURE [14], CHAMELEON [15] and other hierarchical-based algorithms; (3) DBSCAN [16], OPTICS [17], DENCLUE [18] and other density-based algorithms; (4) Grid-based algorithms such as STRING [19], OPTIGRID [20];(5) model-based algorithms such as EM [21], COBWEB [22]. These algorithms mentioned above can meet the needs of clustering small low dimensional datasets.…”
Section: Introductionmentioning
confidence: 99%
“…The main characteristics of a clustering algorithm include: (1) scalability, i.e., the ability to manage a growing number of individuals in a limited period of time, (2) adaptability to identify different clusters, (3) self-driven, i.e., it should require no knowledge of the problem domain, (4) stability which means that the algorithm is not influenced in the presence of noise or/and outliers, and (5) data-independency, i.e., the algorithm should not be affected by the organization of individuals in the dataset [43].…”
Section: Parallel Clustering Algorithmsmentioning
confidence: 99%
“…This paper selects the K-Means (KM) to study cluster quality, execution time, speed up, memory utilization, and scalability under big data mining setup considering the initial centroid initialization. The KM clustering is widely adopted for segmentation, text mining, bioinformatics, wireless sensor networks, financial discipline, data compression, texture segmentation, computer vision, vector quantization, etc (Pandove et al, 2018;Xie et al, 2019).…”
Section: Introductionmentioning
confidence: 99%