This paper proposes a new method of target localization and tracking. The method consists of four parts. The first part is to divide the scene into multiple cells based on the camera’s parameters and calibrate the position and error of each vertex. The second part mainly uses the bounding box detection algorithm, YOLOv4, based on deep learning to detect and recognize the scene image sequence and obtain the type, length, width, and position of the target to be tracked. The third part is to match each vertex of the cell in the image and the cell in the scene, generate a homography matrix, and then use the PnP model to calculate the precise world coordinates of the target in the image. In this process, a cell-based accuracy positioning method is proposed for the first time. The fourth part uses the proposed PTH model to convert the obtained world coordinates into P, T, and H values for the purpose of actively tracking and observing the target in the scene with a PTZ camera. The proposed method achieved precise target positioning and tracking in a 50 cm ∗ 250 cm horizontal channel and a vertical channel. The experimental results show that the method can accurately identify the target to be tracked in the scene, can actively track the moving target in the observation scene, and can obtain a clear image and accurate trajectory of the target. It is verified that the maximum positioning error of the proposed cell-based positioning method is 2.31 cm, and the average positioning error is 1.245 cm. The maximum error of the proposed tracking method based on the PTZ camera is 1.78 degrees, and the average error is 0.656 degrees.