SUMMARYManifold learning based image clustering models are usually employed at local level to deal with images sampled from nonlinear manifold. Multimode patterns in image data matrices can vary from nominal to significant due to images with different expressions, pose, illumination, or occlusion variations. We show that manifold learning based image clustering models are unable to achieve well separated images at local level for image datasets with significant multimode data patterns. Because gray level image features used in these clustering models are not able to capture the local neighborhood structure effectively for multimode image datasets. In this study, we use nearest neighborhood quality (NNQ) measure based criterion to improve local neighborhood structure in terms of correct nearest neighbors of images locally. We found Gist as the optimal image descriptor among HOG, Gist, SUN, SURF, and TED image descriptors based on an overall maximum NNQ measure on 10 benchmark image datasets. We observed significant performance improvement for recently reported clustering models such as Spectral Embedded Clustering (SEC) and Nonnegative Spectral Clustering with Discriminative Regularization (NSDR) using proposed approach. Experimentally, significant overall performance improvement of 10.5% (clustering accuracy) and 9.2% (normalized mutual information) on 13 benchmark image datasets is observed for SEC and NSDR clustering models. Further, overall computational cost of SEC model is reduced to 19% and clustering performance for challenging outdoor natural image databases is significantly improved by using proposed NNQ measure based optimal image representations. key words: multimode image clustering, spectral embedded clustering, nonnegative spectral constraint, NNQ measure, Gist image descriptor