With the intense deployment of wireless systems and the widespread use of intelligent equipment, the requirement for indoor positioning services is increasing, and Wi-Fi fingerprinting has emerged as the most often used approach to identifying indoor target users. The construction time of the Wi-Fi received signal strength (RSS) fingerprint database is short, but the positioning performance is unstable and susceptible to noise. Meanwhile, to strengthen indoor positioning precision, a fingerprints algorithm based on a convolution neural network (CNN) is often used. However, the number of reference points participating in the location estimation has a great influence on the positioning accuracy. There is no standard for the number of reference points involved in position estimation by traditional methods. For the above problems, the grayscale images corresponding to RSS and angle of arrival are fused into RGB images to improve stability. This paper presents a position estimation method based on the density-based spatial clustering of applications with noise (DBSCAN) algorithm, which can select appropriate reference points according to the situation. DBSCAN analyses the CNN output and can choose the number of reference points based on the situation. Finally, the position is approximated using the weighted k-nearest neighbors. The results show that the calculation error of our proposed method is at least 0.1–0.3 m less than that of the traditional method.