In order to locate the mobile robots in three-dimensional indoor environment, mostly global navigation satellite system-denied space, a monocular visual space positioning algorithm based on a deep neural network is proposed. First, the authors employ the lightweight YOLOv5 algorithm for target detection, and the LibTorch deep learning framework is used for model deployment to improve the inference speed. Moreover, a multi-layer perceptron (MLP) neural network with four inputs and two outputs is constructed, which regress the site coordinates of the robot to complete the target localization, and this method is compared with the mathematical model solving algorithm to reflect the accuracy and superiority of positioning algorithm based on the deep neural network. The proposed positioning and tracking system has been successfully applied to international conference on robotics and automation (ICRA) robot competition, and the results show that the positioning mean error estimated by the authors' method is within 10 cm whilst having good real-time performance.