In this paper, we present a clustering approach, MK-SOM, that carries out cluster-dependent feature selection, and partitions images with multiple feature representations into clusters. This work is motivated by the observations that human visual systems (HVS) can receive various kinds of visual cues for interpreting the world. Images identified by HVS as the same category are typically coherent to each other in certain crucial visual cues, but the crucial cues vary from category to category. To account for this observation and bridge the semantic gap, the proposed MK-SOM integrates multiple kernel learning (MKL) into the training process of self-organizing map (SOM), and associates each cluster with a learnable, ensemble kernel. Hence, it can leverage information captured by various image descriptors, and discoveries the cluster-specific characteristics via learning the per-cluster ensemble kernels. Through the optimization iterations, cluster structures are gradually revealed via the features specified by the learned ensemble kernels, while the quality of these ensemble kernels is progressively improved owing to the coherent clusters by enforcing SOM. Besides, MK-SOM allows the introduction of side information to improve performance, and it hence provides a new perspective of applying MKL to address both unsupervised and semi-supervised clustering tasks. Our approach is comprehensively evaluated in the two applications. The superior and promising results manifest its effectiveness.Index Terms-Cluster-dependent feature selection, clustering, image grouping, multiple kernel learning (MKL), object categorization.