2019
DOI: 10.1007/s00357-019-09317-5
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering

Abstract: This paper deals with similarity measures for categorical data in hierarchical clustering, which can deal with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures consider additional characteristics of a dataset, such as a frequency distribution of categories or the number of categories of a given variable. The paper recognizes two main aims. First, to compare and evaluate the selected similarity measures rega… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(21 citation statements)
references
References 21 publications
0
21
0
Order By: Relevance
“…Goodall and Eskin aim to weight values higher that match infrequently, while IOF and Lin give greater weight to values that match frequently, and lower weight to infrequent matches ( Boriah, Chandola & Kumar, 2008 ). All of these measures have been shown previously to perform well on different datasets in different conditions ( Šulc, 2016 ), emphasising the need to test a range of methodologies when clustering ecological data.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Goodall and Eskin aim to weight values higher that match infrequently, while IOF and Lin give greater weight to values that match frequently, and lower weight to infrequent matches ( Boriah, Chandola & Kumar, 2008 ). All of these measures have been shown previously to perform well on different datasets in different conditions ( Šulc, 2016 ), emphasising the need to test a range of methodologies when clustering ecological data.…”
Section: Discussionmentioning
confidence: 99%
“…To select the best distance matrix and clustering method for our data we utilised internal evaluation measures available from nomclust ( Šulc & Řezankovà, 2015 ). The within-cluster entropy coefficient (WCE) is a measure of compactness which evaluates the variability of each cluster by calculating a measure of normalised entropy (the number of variables that have the same categories from each of the variables evaluated) ( Šulc, 2016 ). WCE is measured from 0 to 1, where a lower value indicates intra-cluster homogeneity.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Abstract norms (SN), this variable is measured as follows: Respondents were asked who they consider responsible for energy conservation?Four baseline groups were considered to answer this question: "my friends", "government", "my family members" and "my neighbors" (Cronbach's alpha: 0.73, my average friends: 2.46 (standard deviation: 1.05), Average government: 1.72 (standard deviation: 0.9), average family members: 2.09 (standard deviation: 1.01), average neighbors: 2.24 (standard deviation: 1.07)). On average, households blamed all four groups, but the government, with the lowest average, is completely in agreement with the options, and this could indicate that people in Iranian society still do not want or can not believe that The people are ahead of the government in pursuing the right pattern of public consumption(see Table 3) [97][98][99][100][101][102][103][104][105]. Understanding Behavior Control (PBC) goes back to whether respondents feel that they can save electricity in their homes.…”
Section: Variables Of the Theory Of Planned Behaviormentioning
confidence: 99%