Hans Friedrich Köhn scite author profile

Frey and Dueck (Reports, 16 February 2007, p. 972) described an algorithm termed "affinity propagation" (AP) as a promising alternative to traditional data clustering procedures. We demonstrate that a well-established heuristic for the p-median problem often obtains clustering solutions with lower error than AP and produces these solutions in comparable computation time.F rey and Dueck (1) described an algorithm for analyzing complex data sets termed "affinity propagation" (AP). The algorithm extracts a subset of representative objects or "exemplars" from the complete object set by exchanging real-valued messages between data points. Clusters are formed by assigning each data point to its most similar exemplar. The authors reported that "[a]ffinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time" (1). We demonstrate that an efficient implementation of a 40-year-old heuristic for the well-known p-median model (PMM) often provides lower-error solutions than AP in comparable central processing unit (CPU) time.For consistency with AP in (1), we present the PMM as a sum of similarities maximization problem, while recognizing that this is equivalent to the more common form of minimizing the sum of dissimilarities (e.g., distances or costs). The PMM is a general mathematical problem that can be concisely stated as follows: Given an m × n similarity matrix, S, select p columns from S such that the sum of the maximum values within each row of the selected columns is maximized (2). Thus, each row is effectively assigned to its most similar selected column (exemplar) with the goal of maximizing overall similarity. One classic example of the PMM occurs in facility location planning: Locate p plants such that the total distance (or cost) required to serve m demand points is minimized. In data analysis applications where S is an n × n matrix of negative squared Euclidean distances between objects, clustering the n objects using the PMM corresponds to the selection of p exemplars to minimize error, which is defined as the sum of the squared Euclidean distances of each object to its nearest exemplar.Lagrangian relaxation methods enable the exact solution of PMM instances with n ≤ 500 objects (3, 4). For larger problems, a vertex substitution heuristic (VSH) developed in (5) has been the standard for comparison for nearly four decades. The VSH begins with the random selection of a subset of p exemplars, which is iteratively refined by evaluating the effects of substituting an unselected point for one of the selected exemplars. Frey and Dueck assert that this type of strategy "works well only when the number of clusters is small and chances are good that at least one random initialization is close to a good solution" (1). To the contrary, the VSH is remarkably effective and is often the engine for metaheuristics such as tabu search (6) and variable neighborhood search (7).We compared AP to an efficient implementation of VSH (7) across eight data set...

show abstract

Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Brusco

Köhn

2009

Psychometrika

View full text Add to dashboard Cite

show abstract

The p-median model as a tool for clustering psychological data.

Köhn¹,

Steinley²,

Brusco³

2010

Psychological Methods

View full text Add to dashboard Cite

The p-median clustering model represents a combinatorial approach to partition data sets into disjoint, non-hierarchical groups. Object classes are constructed around exemplars, manifest objects in the data set, with the remaining instances assigned to their closest cluster centers. Effective, state-of-the-art implementations of p-median clustering are virtually unavailable in the popular social and behavioral science statistical software packages. We present p-median clustering, including a detailed description of its mechanics, a discussion of available software programs and their capabilities. Application to a complex structured data set on the perception of food items illustrate p-median clustering.

show abstract

An Exact Method for Partitioning Dichotomous Items Within the Framework of the Monotone Homogeneity Model

2015

View full text Add to dashboard Cite

The monotone homogeneity model (MHM-also known as the unidimensional monotone latent variable model) is a nonparametric IRT formulation that provides the underpinning for partitioning a collection of dichotomous items to form scales. Ellis (Psychometrika 79:303-316, 2014, doi: 10.1007/s11336-013-9341-5 ) has recently derived inequalities that are implied by the MHM, yet require only the bivariate (inter-item) correlations. In this paper, we incorporate these inequalities within a mathematical programming formulation for partitioning a set of dichotomous scale items. The objective criterion of the partitioning model is to produce clusters of maximum cardinality. The formulation is a binary integer linear program that can be solved exactly using commercial mathematical programming software. However, we have also developed a standalone branch-and-bound algorithm that produces globally optimal solutions. Simulation results and a numerical example are provided to demonstrate the proposed method.

show abstract

Attribute Hierarchy Models in Cognitive Diagnosis: Identifiability of the Latent Attribute Space and Conditions for Completeness of the Q-Matrix

Köhn

Chiu

2018

J Classif

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.