In visual indoor positioning systems, the method of constructing a visual map by point-by-point sampling is widely used due to its characteristics of clear static images and simple coordinate calculation. However, too small a sampling interval will cause image redundancy, while too large a sampling interval will lead to the absence of any scene images, which will result in worse positioning efficiency and inferior positioning accuracy. As a result, this paper proposed a visual map construction method based on pre-sampled image features matching, according to the epipolar geometry of adjacent position images, to determine the optimal sampling spacing within the constraints and effectively control the database size while ensuring the integrity of the image information. In addition, in order to realize the rapid retrieval of the visual map and reduce the positioning error caused by the time overhead, an image retrieval method based on deep hashing was also designed in this paper. This method used a convolutional neural network to extract image features to construct the semantic similarity structure to guide the generation of hash code. Based on the log-cosh function, this paper proposed a loss function whose function curve was smooth and not affected by outliers, and then integrated it into the deep network to optimize parameters, for fast and accurate image retrieval. Experiments on the FLICKR25K dataset and the visual map proved that the method proposed in this paper could achieve sub-second image retrieval with guaranteed accuracy, thereby demonstrating its promising performance.