A data filtering method for cluster analysis is proposed, based on minimizing a least squares function with a weighted ℓ 0 -norm penalty. To overcome the discontinuity of the objective function, smooth non-convex functions are employed to approximate the ℓ 0 -norm. The convergence of the global minimum points of the approximating problems towards global minimum points of the original problem is stated. The proposed method also exploits a suitable technique to choose the penalty parameter. Numerical results on synthetic and real data sets are finally provided, showing how some existing clustering methods can take advantages from the proposed filtering strategy.
Keywords. Zero-norm approximation Cluster analysis Nonlinear optimization.AMS subject classifications. 90C30. 62H30. 90C06. 49M15.
MotivationCluster analysis is a branch of unsupervised learning, arising in many real-world applications and in different fields, e.g., biology, medicine, marketing, document retrieval, image segmentation and many others. It deals with grouping objects so that "alike" data are in the same clusters and "unlike" data are in different clusters. More formally, given a finite set of vectors X = {x 1 , . . . , x m } ⊂ R n , we want to divide X into k groups (clusters), according to a defined measure of similarity, where k can be either known or unknown.Partitioning X into a fixed number of clusters is known to be an NP-hard problem [9] and many existing clustering models are formulated as non-convex optimization problems. As a result, algorithms can generally find only approximate solutions. Moreover, there is no objectively "right" clustering model and the choice of the most suitable algorithm can strongly depend on the specific data set. So, there is still a great interest in developing new strategies for cluster analysis, also in the field of numerical optimization.Here, we propose a data filtering method based on combining two different techniques. The first one is a reformulation of the clustering problem as a penalized regression problem, proposed in [21,11,14] and further studied in [20,3,18]. Assuming that the number of clusters is unknown, this approach is based on introducing for each observation 1