2020
DOI: 10.1142/s0219622019300064
|View full text |Cite
|
Sign up to set email alerts
|

Clustering Categorical Data: A Survey

Abstract: Clustering is a complex unsupervised method used to group most similar observations of a given dataset within the same cluster. To guarantee high efficiency, the clustering process should ensure high accuracy and low complexity. Many clustering methods were developed in various fields depending on the type of application and the data type considered. Categorical clustering considers segmenting a dataset in which the data are categorical and were widely used in many real-world applications. Thus several methods… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(13 citation statements)
references
References 80 publications
0
12
0
1
Order By: Relevance
“…Notably, using only the WoS database is considered sufficient for retrieving clustering-related articles. Additionally, many review papers rely on the official website [17][18][19]24] or the publisher's website [20][21][22] as their primary source. In another study, Wang et al [23] retrieved articles indexed by Google Scholar and the WoS database.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Notably, using only the WoS database is considered sufficient for retrieving clustering-related articles. Additionally, many review papers rely on the official website [17][18][19]24] or the publisher's website [20][21][22] as their primary source. In another study, Wang et al [23] retrieved articles indexed by Google Scholar and the WoS database.…”
Section: Methodsmentioning
confidence: 99%
“…For instance, ref. [17] categorizes clustering into hard, fuzzy, and rough set clustering. Another taxonomy, proposed by [18], classifies distance or similarity metrics for categorical data, distinguishing similarity into context-sensitive and context-free, with the context-free category comprising probabilistic, information-theoretic, and frequency-based approaches.…”
Section: Introductionmentioning
confidence: 99%
“…Clustering algorithms can be broadly classified into two categories, namely, hierarchical and partitional methods [11]. Hierarchical clustering methods assume a hierarchical structure between clusters and recursively find nested clusters.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Converting continuous variables into categorical ones generally involves binning into a pre-determined number of representative categories, which then allows for the application of clustering methods developed for categorical data such as CACTUS [7], Squeezer [8], Clarke et al's [9] ensemble method, and a plethora of others [10]. However, this form of discretization results in loss of information and poses a new non-trivial challenge of selecting an appropriate discretization scheme, a critical choice that directly determines the resulting similarity matrix and/or clustering [3].…”
Section: Variable Conversionmentioning
confidence: 99%