With the development of computer vision, image processing, and other technologies, the management of smart cities has been enhanced, and intelligent visual detection and tracking technology has progressed. A single-camera monitoring system presents challenges, including limited observation range, unstable tracking, and difficulties in recognizing complex scene obstructions. To overcome these obstacles, a multi-camera monitoring system must be implemented. To enhance the accuracy of multiple cameras' positioning and recognition, while also increasing their efficiency in recognizing targets, this study employs a novel approach that combines spatial mapping based on position data and feature matching based on target objects. Firstly, in the overlapping area of multiple camera targets, a uniform spatial constraint method is used to map and match the target object. The color features of the target object are used for matching. Secondly, the You only look once (YOLO) object detection algorithm is introduced to recognize targets within the overlapping area of the camera using homologous transformation. In this way, a multi camera positioning technology based on YOLO object detection algorithm is designed. The test results show that the YOLOv5 algorithm has a maximum mAP accuracy of 97.2% on the test set. At a reasoning speed of 10 ms, the YOLOv5 algorithm has a maximum mAP accuracy of 51.6%. The average values of the classification loss function, target loss function, and GloU loss function of the YOLOv5 algorithm are 0.001, 0.01, and 0.015, respectively. The error probability of YOLO within 10cm in the DukeMTMC re TD dataset remains above 96.5%. The error probability of YOLO within 9.5cm in the OTB dataset remains above 95%. When the target object is blocked, the highest accuracy of the YOLO positioning system is 0.74. The above results indicate that the multi camera localization technology based on YOLO object detection algorithm can improve the accuracy of localization and recognition. It can also solve the problems of object occlusion recognition and continuous object tracking.