With the rapid development of surveying and spatial information technologies, more and more attention has been given to positioning. In outdoor environments, people can easily obtain positioning services through global navigation satellite systems (GNSS). In indoor environments, the GNSS signal is often lost, while other positioning problems, such as dead reckoning and wireless signals, will face accumulated errors and signal interference. Therefore, this research uses images to realize a positioning service. The main concept of this work is to establish a model for an indoor field image and its coordinate information and to judge its position by image eigenvalue matching. Based on the architecture of PoseNet, the image is input into a 23-layer convolutional neural network according to various sizes to train end-to-end location identification tasks, and the three-dimensional position vector of the camera is regressed. The experimental data are taken from the underground parking lot and the Palace Museum. The preliminary experimental results show that this new method designed by us can effectively improve the accuracy of indoor positioning by about 20% to 30%. In addition, this paper also discusses other architectures, field sizes, camera parameters, and error corrections for this neural network system. The preliminary experimental results show that the angle error correction method designed by us can effectively improve positioning by about 20%.