2018
DOI: 10.1111/insr.12274
|View full text |Cite
|
Sign up to set email alerts
|

Distance Metrics and Clustering Methods for Mixed‐type Data

Abstract: In spite of the abundance of clustering techniques and algorithms, clustering mixed interval (continuous) and categorical (nominal and/or ordinal) scale data remain a challenging problem.In order to identify the most effective approaches for clustering mixed-type data, we use both theoretical and empirical analyses to present a critical review of the strengths and weaknesses of the methods identified in the literature. Guidelines on approaches to use under different scenarios are provided, along with potential… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(45 citation statements)
references
References 123 publications
0
44
0
1
Order By: Relevance
“…Further, as input, we used a dataset with no preprocessing. These distance metrics serve to capture the differences between the data samples and vary in their capacity to deal with large outliers (i.e., between weighted, centroid, and median metrics) or if they allow choosing the number of clusters to consider (e.g., Ward) (Foss, Markatou & Ray, 2019). After this clustering, we tested all of the datasets created in the previous step to determine the best preprocessing methodology.…”
Section: Unsupervised Learning Experimentsmentioning
confidence: 99%
“…Further, as input, we used a dataset with no preprocessing. These distance metrics serve to capture the differences between the data samples and vary in their capacity to deal with large outliers (i.e., between weighted, centroid, and median metrics) or if they allow choosing the number of clusters to consider (e.g., Ward) (Foss, Markatou & Ray, 2019). After this clustering, we tested all of the datasets created in the previous step to determine the best preprocessing methodology.…”
Section: Unsupervised Learning Experimentsmentioning
confidence: 99%
“…For a thorough review of model‐based clustering methods for mixed‐type data we refer the reader to Foss et al ().…”
Section: Discussionmentioning
confidence: 99%
“…The procedure to transform the input HTML files into weighted graphs was as follows: each DOM node was transformed into a graph node; the graph nodes were connected according to the links of their corresponding DOM nodes; the weights were computed by measuring the distance between the attributes of the corresponding nodes. Since the DOM nodes have real‐valued, categorical, and ordinal attributes, we resorted to Foss et al's 22 approach to compute the distances among them.…”
Section: Experimental Analysismentioning
confidence: 99%