With the development of science and technology, the traditional industrial structures are constantly being upgraded. As far as drones are concerned, an increasing number of researchers are using reinforcement learning or deep learning to make drones more intelligent. At present, there are many algorithms for object detection. Although many models have a high accuracy of detection, these models have many parameters and high complexity, making them unable to perform real-time detection. Therefore, it is particularly important to design a lightweight object detection algorithm that is able to meet the needs of real-time detection using UAVs. In response to the above problems, this paper establishes a dataset of six animals in grassland from different angles and during different time periods on the basis of the remote sensing images of drones. In addition, on the basis of the Yolov5s network model, a lightweight object detector is designed. First, Squeeze-and-Excitation Networks are introduced to improve the expressiveness of the network model. Secondly, the convolutional layer of branch 2 in the BottleNeckCSP structure is deleted, and 3/4 of its input channels are directly merged with the results of branch 1 processing, which reduces the number of model parameters. Next, in the SPP module of the network model, a 3 × 3 maximum pooling layer is added to improve the receptive field of the model. Finally, the trained model is applied to NVIDIA-TX2 processor for real-time object detection. After testing, the optimized YOLOv5 grassland animal detection model was able to effectively identify six different forms of grassland animal. Compared with the YOLOv3, EfficientDet-D0, YOLOv4 and YOLOv5s network models, the mAP_0.5 value was improved by 0.186, 0.03, 0.007 and 0.011, respectively, and the mAP_0.5:0.95 value was improved by 0.216, 0.066, 0.034 and 0.051, respectively, with an average detection speed of 26 fps. The experimental results show that the grassland animal detection model based on the YOLOv5 network has high detection accuracy, good robustness, and faster calculation speed in different time periods and at different viewing angles.