Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.
Sorting algorithms like MergeSort or BubbleSort order items according to some criterion. Whereas the computational complexities of the various sorting algorithms are well understood, their behavior with noisy input data or unreliable algorithm operations is less known.In this work, we present an information-theoretic approach to quantifying the information content of algorithms. We exemplify the significance of this approach by comparing different algorithms w.r.t to both informativeness and stability. For the first time, the amount of order information that a sorting algorithm can extract in uncertain settings is measured quantitatively. Such measurements not only render a principled comparison of algorithms possible, but also guide the design and construction of algorithms that provide the maximum information.Results for five popular sorting algorithms are illustrated, giving new insights about the amount of ordering information achievable for them. For example, in noisy settings, BubbleSort can outperform MergeSort in the number of bits that can be effectively extracted per comparison made.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.