RECENT advances in high-throughput flow cytometry (FCM) technology require the theoretical development and the efficient computational implementation of new methods for automated identification of cell populations. Compared with manual gating methods, the current de facto gold standard, these automated methods are expected not only to be faster, but also to increase the reproducibility of data analysis pipelines.According to a very recent comprehensive survey of FCM data analysis methods by Bashashati and Brinkman (1), automated gating methods can be used to identify both known and unknown cell populations, with the latter including the case of subpopulations that cannot be easily identified using two-dimensional manual gating methods. To be able to properly perform the unsupervised automated gating of FCM data, general clustering methods need to fulfill several criteria, such as computational efficiency (to handle the commonly encountered very large data sets in a practically reasonable amount of time), robustness to the shape of the clusters (from spherical to concave cell populations such as ''banana-shaped'' populations) or the density of the clusters (from very sparse to very dense cell populations, depending on the type of gating to be performed), and ability to identify the (generally unknown) number of populations (1).The article by Aghaeepour et al. (2) published in this issue makes an important contribution to the field by introducing flowMeans, a fast automated gating method based on an extension of k-means clustering. It is important to note that the new method has been specifically developed to address several problematic aspects relating to the application of the kmeans clustering to FCM data. These limitations include the identification of the number of populations (k), the sensitivity of the clustering results to the initial values, and implicit restriction to spherical cell populations, with the last limitation particularly relevant to FCM data. The flowMeans method solves the two problems of identifying k and dealing with concave cell populations by starting with a larger number of clusters (by using a ''reasonable'' upper bound for k) and merging them to allow multiple overlapping clusters to represent the same subpopulation.To go into the specific details of the method, flowMeans starts by estimating the number of modes for each one-dimensional projection of the FCM data, using an approach based on kernel density estimation as described by Duong et al. (3). The total number of modes across all dimensions is used as a maximum for k, given that this sum is an overestimate of the number of subpopulations from the multidimensional space. Because there are more clusters than needed, the resulting clusters have to be further merged to determine the number of subpopulations, and the process is iterated by alternating between calculating the distances between pairs of clusters and merging the closest pair of clusters. The underlying empirical principle is that keeping track of the minimum distance...