The minimum sum-of-squares clustering problem is a very important problem in data mining and machine learning with very many applications in, e.g., medicine or social sciences. However, it is known to be NP-hard in all relevant cases and to be notoriously hard to be solved to global optimality in practice. In this paper, we develop and test different tailored mixed-integer programming techniques to improve the performance of state-of-the-art MINLP solvers when applied to the problem—among them are cutting planes, propagation techniques, branching rules, or primal heuristics. Our extensive numerical study shows that our techniques significantly improve the performance of the open-source MINLP solver . Consequently, using our novel techniques, we can solve many instances that are not solvable with without our techniques and we obtain much smaller gaps for those instances that can still not be solved to global optimality.
Cardinality-constrained optimization problems are notoriously hard to solve in both theory and practice. However, as famous examples, such as the sparse portfolio optimization and best subset selection problems, show, this class is extremely important in real-world applications. In this paper, we apply a penalty alternating direction method to these problems. The key idea is to split the problem along its discrete-continuous structure to obtain two subproblems that are much easier to solve than the original problem. In addition, the coupling between these subproblems is achieved via a classic penalty framework. The method can be seen as a primal heuristic for which convergence results are readily available from the literature. In our extensive computational study, we first show that the method is competitive to a commercial mixed-integer program solver for the portfolio optimization problem. On these instances, we also test a variant of our approach that uses a perspective reformulation of the problem. Regarding the best subset selection problem, it turns out that our method significantly outperforms commercial solvers and it is at least competitive to state-of-the-art methods from the literature.
Abstractk-means clustering is a classic method of unsupervised learning with the aim of partitioning a given number of measurements into k clusters. In many modern applications, however, this approach suffers from unstructured measurement errors because the k-means clustering result then represents a clustering of the erroneous measurements instead of retrieving the true underlying clustering structure. We resolve this issue by applying techniques from robust optimization to hedge the clustering result against unstructured errors in the observed data. To this end, we derive the strictly and $$\Gamma $$
Γ
-robust counterparts of the k-means clustering problem. Since the nominal problem is already NP-hard, global approaches are often not feasible in practice. As a remedy, we develop tailored alternating direction methods by decomposing the search space of the nominal as well as of the robustified problems to quickly obtain feasible points of good quality. Our numerical results reveal an interesting feature: the less conservative $$\Gamma $$
Γ
-approach is clearly outperformed by the strictly robust clustering method. In particular, the strictly robustified clustering method is able to recover clusterings of the original data even if only erroneous measurements are observed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.