Place recognition is an important perceptual robotic problem, especially in the navigation process. Previous place-recognition approaches have been used for solving 'global localization' and 'kidnapped robot' problems. Such approaches are usually performed in a supervised mode. In this paper, a robust appearance-based unsupervised place clustering and recognition algorithm is introduced. This method fuses several image features using speed up robust features (SURF) by agglomerating them into a union form of features inside each place cluster. The number of place clusters can be extracted by investigating the SURF-based scene similarity diagram between adjacent images. During a human-guided learning step, the robot captures visual information acquired by an embedded camera and converts them into topological place clusters. Experimental results show the robustness, accuracy, and efficiency of the method, as well as its ability to create topological place clusters for solving global localization and kidnapped robot problems. The performance of the developed system is remarkable in terms of time, clustering error, and recognition precision.