We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion α of contaminating data to guarantee the robustness of the method. As a characteristic feature, restrictions on the ratio between the maximum and the minimum eigenvalues of the groups scatter matrices are introduced. This makes the problem to be well defined and guarantees the consistency of the sample solutions to the population ones.The method covers a wide range of clustering approaches depending on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem. 3 ous if a small group of tightly joined outliers should be considered as a proper group instead of a contamination phenomenon. Finally, note that the precise detection of the outliers is an important task due to the serious troubles they introduce in standard clustering procedures (see, e.g., and Hennig [19]) as well as the appealing interest that outliers could have by themselves after explaining why they depart from general behavior.Two general model-based approaches which provide a theoretically wellbased clustering criterion in presence of outliers are (see Bock [2]) the mixture modeling and the trimming approach. To the first category belongs, say, the work by Fraley and Raftery [8], that considers mixture fittings with the addition of a mixture component accounting for the "noise," or McLachlan and Peel [23] that resorts to mixtures of t distributions. In this paper we are concerned with the trimming approach, previously introduced in Cuesta-Albertos, Gordaliza and Matrán [4] and followed by recent proposals by Gallegos [9,10] and Gallegos and Ritter [11] (see also Gordaliza and Matrán [14] and [15]). Notice that a "crisp" 0-1 approach is usually adopted in trimming approaches while some groups' ownership probabilities are generally returned by mixture modeling. Also, while mixture modeling tries to fit the outlying observations in the model, the trimming approach attempts to discard them completely. The methodology presented in this paper falls within the category of trimming approach methods and all the comparisons will be made within this category.To know how to perform the trimming in cluster analysis is not straightforward because there exist no privileged directions for searching outlying values and, most of the time, we even need to remove observations which fall between the groups ("bridge" data points). The first attempt of trimming in clustering, through an "impartial" approach, appeared in [4] as a modification of the k-means method. Moreover, [12] shows that the impartial trimming provides better results in terms of robustness than the consideration of different penalty functions in the k-means method (e.g., k-medoids).The use of trimmed k-means involves a considerable drawback because it implicitly assumes the same spherical covariance matrix for the groups (as classical k-means does). The extension in [...
Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to "fit" noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.