Aiming at the problem that multi-ship target detection and tracking based on cameras is difficult to meet the accuracy and speed requirements at the same time in some complex scenes, an improved YOLOv4 algorithm is proposed, which simplified the network of the feature extraction layer to obtain more shallow feature information and avoid the disappearance of small ship target features, and uses the residual network to replace the continuous convolution operation to solve the problems of network degradation and gradient disappearance. In addition, a nonlinear target tracking model based on the UKF method is constructed to solve the problem of low real-time performance and low precision in multi-ship target tracking. Multi-ship target detection and tracking experiments were carried out in many scenes with large differences in ship sizes, strong background interference, tilted images, backlight, insufficient illumination, and rain. Experimental results show that the average precision of the detection algorithm of this paper is 0.945, and the processing speed is about 34.5 frame per second, where the real-time performance is much better than other algorithms while maintaining high precision. Furthermore, the multiple object tracking accuracy (MOTA) and the multiple object tracking precision (MOTP) of this paper algorithm are 76.4 and 80.6, respectively, which are both better than other algorithms. The method proposed in this paper can realize the ship target detection and tracking well, with less missing detection and false detection, and also has good accuracy and real-time performance. The experimental results provide a valuable theoretical reference for the further practical application of the method.