Visual localization is employed for indoor navigation and embedded in various applications, such as augmented reality and mixed reality. Image retrieval and geometrical measurement are the primary steps in visual localization, and the key to improving localization efficiency is to reduce the time consumption of the image retrieval. Therefore, a hierarchical clustering-based image-retrieval method is proposed to hierarchically organize an off-line image database, resulting in control of the time consumption of image retrieval within a reasonable range. The image database is hierarchically organized by two stages: scene-level clustering and sub-scene-level clustering. In scene-level clustering, an improved cumulative sum algorithm is proposed to detect change points and then group images by global features. On the basis of scene-level clustering, a feature tracking-based method is introduced to further group images into sub-scene-level clusters. An image retrieval algorithm with a backtracking mechanism is designed and applied for visual localization. In addition, a weighted KNN-based visual localization method is presented, and the estimated query position is solved by the Armijo–Goldstein algorithm. Experimental results indicate that the running time of image retrieval does not linearly increase with the size of image databases, which is beneficial to improving localization efficiency.