2018
DOI: 10.3390/a11110177
|View full text |Cite
|
Sign up to set email alerts
|

Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data

Abstract: Clustering is one of the main tasks of machine learning. Internal clustering validation indexes (CVIs) are used to measure the quality of several clustered partitions to determine the local optimal clustering results in an unsupervised manner, and can act as the objective function of clustering algorithms. In this paper, we first studied several well-known internal CVIs for categorical data clustering, and proved the ineffectiveness of evaluating the partitions of different numbers of clusters without any inte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
2
2
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…To identify groups of people with dementia who have similar AT needs this study employed cluster analysis. This technique is essentially concerned with discovering intrinsic discrete groups within data (27)(28)(29). Reduction of a heterogeneous sample into a number of more homogeneous groups provides a means to organise large quantities of information and facilitates consideration of multiple characteristics (30).…”
Section: Methodsmentioning
confidence: 99%
“…To identify groups of people with dementia who have similar AT needs this study employed cluster analysis. This technique is essentially concerned with discovering intrinsic discrete groups within data (27)(28)(29). Reduction of a heterogeneous sample into a number of more homogeneous groups provides a means to organise large quantities of information and facilitates consideration of multiple characteristics (30).…”
Section: Methodsmentioning
confidence: 99%
“…To apply parallel processing to large-scale datasets from distributed computers, it is necessary to consider that each cluster of data processed parallelly cannot have its own uniqueness; it must be able to adapt to current computer resources, and the data volume of each cluster can be adjusted arbitrarily [6,8]. The method adopted here to split the data is to use the initial point filtering method in the K-means++ algorithm for determining the centroid [9].…”
Section: Splitting the Datamentioning
confidence: 99%