The estimation of vehicle loads is a rising research hotspot in bridge structure health monitoring (SHM). Traditional methods, such as the bridge weight-in-motion system (BWIM), are widely used but they fail to record the locations of vehicles on the bridges. Computer vision-based approaches are promising ways for vehicle tracking on bridges. Nevertheless, keeping track of vehicles from the video frames of multiple cameras without an overlapped visual field poses a challenge for the tracking of vehicles across the whole bridge. In this study, a method that was You Only Look Once v4 (YOLOv4)- and Omni-Scale Net (OSNet)-based was proposed to realize vehicle detecting and tracking across multiple cameras. A modified IoU-based tracking method was proposed to track a vehicle in adjacent video frames from the same camera, which takes both the appearance of vehicles and overlapping rates between the vehicle bounding boxes into consideration. The Hungary algorithm was adopted to match vehicle photos in various videos. Moreover, a dataset with 25,080 images of 1727 vehicles for vehicle identification was established to train and evaluate four models. Field validation experiments based on videos from three surveillance cameras were conducted to validate the proposed method. Experimental results show that the proposed method has an accuracy of 97.7% in terms of vehicle tracking in the visual field of a single camera and over 92.5% in tracking across multiple cameras, which can contribute to the acquisition of the temporal–spatial distribution of vehicle loads on the whole bridge.