Highly multiplexed in situ imaging cytometry assays have enabled researchers to scrutinize cellular systems at an unprecedented level. With the capability of these assays to simultaneously profile the spatial distribution and molecular features of many cells, unsupervised machine learning, and in particular clustering algorithms, have become indispensable for identifying cell types and subsets based on these molecular features. The most widely used clustering approaches applied to these novel technologies were developed for cell suspension technologies. To date, there have been no systematic evaluations of the properties of these methods that are optimal for in situ imaging assays. In this work, we systematically evaluated the performance of various similarity metrics used to quantify the similarity between cells when clustering. Our results demonstrate that performance in cell clustering varies significantly when different similarity metrics were used. Lastly, we propose FuseSOM, an ensemble clustering algorithm employing hierarchical multi-view learning of similarity metrics and self-organizing maps (SOM). Using a stratified subsampling analysis framework, FuseSOM exhibits superior clustering performance compared to the current best-practice clustering approaches for in situ imaging cytometry data analysis.