Abstract. The aim in this paper is to develop a method for clustering together image views of the same object class. Local invariant feature methods, such as SIFT, have been proven effective for image clustering. However, they have made either relatively little use or too complex use of geometric constraints and are confounded when the detected features are superabundant. Here we make two contributions aimed at overcoming these problems. First, we rank the SIFT points (R-SIFT) using visual saliency. Second, we use the reduced set of R-SIFT features to construct a specific hyper graph (CSHG) model of holistic-structure. Based on the CSHG model, a two stage clustering method is proposed. In which, images are clustered according to the pairwise similarity of the graphs, which is a combination of the traditional similarity of local invariant feature vectors and the geometric similarity between two graphs. This method comprehensively utilizes both SIFT and geometric constraints, and hence combines both global and local information. Experiments reveal that the method gives excellent clustering performance.