The phenomenon of angular inclination of flexible structures during vibration poses a significant challenge to the applicability of visual vibration measurement methods because the target locked in the captured image will produce unknown geometric deformations such as scale, displacement, and angle in the time domain space, and the horizontal rectangular frame used for matching during target detection will also increase the false detection rate of the target due to the introduction of more background information. Such subtle geometric deformations and false detections can lead to severe fit errors in the displacement curves regressed by the visual vibration measurement algorithm. To effectively improve the accuracy and robustness of vibration image target recognition, this article takes the flexible body captured by a high-speed camera as the target of vibration displacement measurement. It introduces the rotating target detection method based on deep learning into the field of visual vibration measurement, which verifies the feasibility of the deep learning method in flexible body vibration measurement, and based on the deep convolutional neural network framework, a high-precision displacement measurement algorithm based on single-stage anchor-free rotating target detection is proposed. The algorithm in this article first uses the CSPDarknet backbone network to extract multi-scale features of flexible structural image sequences. It then uses PANet to fuse the top-down and bottom-up bidirectional feature maps of the four bridge target feature maps obtained through the backbone network. The shallow and deep information is used for semantic feature fusion and combined with the Coordinate Attention mechanism to achieve target finding and fine positioning on the feature map. Finally, we use the coordinates of the bounding box obtained from the test to regress the position offset of the object's center point. To verify the accuracy of the algorithm in this article, we conducted experimental validation on the cable-stayed bridge model and the actual bridge and compared the performance with the traditional template matching algorithm, differential optical flow method, and various deep learning algorithms with different localization principles, as well as the displacement signals collected and processed by accelerometers. The experimental results of time-frequency characteristics analysis show that the vibration displacement trajectories regressed by the algorithm in this paper have the best overlap with the displacement measurements collected by the accelerometer, which verifies that the algorithm in this article has good application potential and implementation space in the field of condition monitoring of flexible structural bodies.