2011
DOI: 10.1007/978-3-642-20847-8_39
|View full text |Cite
|
Sign up to set email alerts
|

DISC: Data-Intensive Similarity Measure for Categorical Data

Abstract: Abstract. The concept of similarity is fundamentally important in almost every scientific field. Clustering, distance-based outlier detection, classification, regression and search are major data mining techniques which compute the similarities between instances and hence the choice of a particular similarity measure can turn out to be a major cause of success or failure of the algorithm. The notion of similarity or distance for categorical data is not as straightforward as for continuous data and hence, is a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…To assess the applicability of similarity measures for a specific CBR system the following evaluation criteria are used (derived from Boriah et al, 2008;Desai et al, 2011;Sulc and Rezankov a, 2014)…”
Section: Case Retrievalmentioning
confidence: 99%
“…To assess the applicability of similarity measures for a specific CBR system the following evaluation criteria are used (derived from Boriah et al, 2008;Desai et al, 2011;Sulc and Rezankov a, 2014)…”
Section: Case Retrievalmentioning
confidence: 99%
“…Conversely, authors in [19] make a comparative study of categorical variable encodings for predicting vehicle properties using neural networks. A similarity measure to handle categorical variables is introduced in [20] and validated in KNN regression problems over twelve datasets. In [21] authors propose a hybrid decision tree algorithm for mixed categorical and numerical regression analysis.…”
Section: Related Workmentioning
confidence: 99%
“…If data set contains only categorical attributes, algorithms like ROCK, K-modes [8], DISC [4] etc are available. Most of existing algorithms for mixed data types, have different distance measures for both numerical and categorical variables and combine those during clustering.…”
Section: Related Workmentioning
confidence: 99%