The physiological well-being of dairy cows is intimately tied to their behavior. Detecting aberrant dairy cows early and reducing financial losses on farms are both possible with real-time and reliable monitoring of their behavior. The behavior data of dairy cows in real environments have dense occlusion and multi-scale issues, which affect the detection results of the model. Therefore, we focus on both data processing and model construction to improve the results of dairy cow behavior detection. We use a mixed data augmentation method to provide the model with rich cow behavior features. Simultaneously refining the model to optimize the detection outcomes of dairy cow behavior amidst challenging conditions, such as dense occlusion and varying scales. First, a Res2 backbone was constructed to incorporate multi-scale receptive fields and improve the YOLOv3’s backbone for the multi-scale feature of dairy cow behaviors. In addition, YOLOv3 detectors were optimized to accurately locate individual dairy cows in different dense environments by combining the global location information of images, and the Global Context Predict Head was designed to enhance the performance of recognizing dairy cow behaviors in crowded surroundings. The dairy cow behavior detection model we built has an accuracy of 90.6%, 91.7%, 80.7%, and 98.5% for the four behaviors of dairy cows standing, lying, walking, and mounting, respectively. The average accuracy of dairy cow detection is 90.4%, which is 1.2% and 12.9% higher than the detection results of YOLOV3, YOLO-tiny and other models respectively. In comparison to YOLOv3, the Average Precision evaluation of the model improves by 2.6% and 1.4% for two similar features of walking and standing behaviors, respectively. The recognition results prove that the model generalizes better for recognizing dairy cow behaviors using behavior videos in various scenes with multi-scale and dense environment features.