2008
DOI: 10.1007/s10618-008-0103-4
|View full text |Cite
|
Sign up to set email alerts
|

Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results

Abstract: Clustering with constraints is a powerful method that allows users to specify background knowledge and the expected cluster properties. Significant work has explored the incorporation of instance-level constraints into non-hierarchical clustering but not into hierarchical clustering algorithms. In this paper we present a formal complexity analysis of the problem and show that constraints can be used to not only improve the quality of the resultant dendrogram but also the efficiency of the algorithms. This is p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0
4

Year Published

2011
2011
2016
2016

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 71 publications
(51 citation statements)
references
References 14 publications
0
47
0
4
Order By: Relevance
“…In our problem the answer is obviously yes because all of the constraints are true. However determining whether there is a feasible solution which satisfy all constraints is NP-complete [8]. In HACC, dead-end situations (reaching an iteration with more than K clusters, where no further pair of clusters can be joined without violating any of the constraints) can occur in principle, but in practice we find this is not a problem.…”
Section: Minimum Spanning Forests (Msf)mentioning
confidence: 96%
See 1 more Smart Citation
“…In our problem the answer is obviously yes because all of the constraints are true. However determining whether there is a feasible solution which satisfy all constraints is NP-complete [8]. In HACC, dead-end situations (reaching an iteration with more than K clusters, where no further pair of clusters can be joined without violating any of the constraints) can occur in principle, but in practice we find this is not a problem.…”
Section: Minimum Spanning Forests (Msf)mentioning
confidence: 96%
“…Since we are working with hierarchical clustering with constraints we have to discuss feasibility and dead-end issues [8]. The feasibility problem is defined as, given a set of data and set of constraints, does there exist a partitioning of the data into K clusters?…”
Section: Minimum Spanning Forests (Msf)mentioning
confidence: 99%
“…We use a clustering algorithm that works based on a distance function, which determines the distance between every two attributes. Specifically we use the single-link CAHC (Constrained Agglomerative Hierarchical Clustering) algorithm [8]. In the rest of this section, we first describe our distance function.…”
Section: Schema Matchingmentioning
confidence: 99%
“…Must-link constraints are usually considered an easier case than Cannot-link constraints, since Must-link constraints are transitive and can be represented as a shortening of the distance between the pair to zero or some other small constant [2,3]. On the other hand, Cannot-link constraints are non-transitive and have no obvious geometric interpretation [2,6,7] and therefore considered a difficult problem.…”
Section: Introductionmentioning
confidence: 99%