With the increasing maintenance demands for wind turbines, unmanned aerial vehicle (UAV) technology has become widely utilized for turbine inspections. However, wind turbine blades, which are thin and long, possess weak texture features that lead to target confusion when tracking specific parts of the dynamic blades. Additionally, wind turbine units, being large dynamic structures, often exceed the camera's field of view (FOV) and exhibit unique motion characteristics. These factors make the visual tracking of specific components unstable due to the lack of global motion information. To address these challenges and achieve consistent calibration of critical components under the motion conditions of the wind turbine, this study has integrated the Squeeze-and-Excitation Network (SEnet) into the backbone network of YOLOv5. Furthermore, two hyperparameters have been introduced into the existing loss function to control the weights of unbalanced samples, thereby enhancing detection accuracy. In the DeepSORT tracking algorithm, multiple Long Short-Term Memory (LSTM) units have been employed to predict the trajectory of the rotor blade's center point, and an optimized Kalman filter has been utilized to improve the system's adaptability and precision. The experimental results demonstrate that this method can accurately distinguish individual blades and specific sections of blades. The enhanced fusion model showed a 5.3% improvement in the mean average precision (mAP_0.5) evaluation metric. Moreover, the method achieved continuous and stable tracking of moving blades, even in scenarios where rotor blades frequently enter and exit the FOV. This approach strengthens the potential for widespread use of drones in wind turbine inspections.