Traditional studies on Yangtze finless porpoise behavior rely on manual observation mainly, posing challenges such as low efficiency, high labor costs and visual fatigue. To address these issues, the Baiji Dolphinarium at the Institute of Hydrobiology, Chinese Academy of Sciences, has deployed a monitoring platform supporting accelerated playback to enhance observation efficiency and conduct researches on Yangtze finless porpoise sexual behaviors leading to reproduction and conservation. The plantform gives a new way to monitor Yangtze finless porpoises` behavior by using computer vision, which means real-time object detection algorithm on recognizing sexual behaviors of Yangtze finless porpoises is essential. However, existing models suffer from imbalance categories problem in finless porpoise sexual behavior dataset, background noise in images, occlusion and overlap sections of Yangtze finless porpoises.. In this paper, we established the first Yangtze Finless Porpoise Sexual Behavior dataset (YFPSB) in artificial rearing environments, consisting of 4900 images in different camera views for reaserchers to train and test new vision algorithms. We also propose an improved method based on YOLOv8 to tackle theseproblems. Specifically, we introduce Expanded Window Multi-Head Self-Attention (EW-MHSA) into the backbone network to enhance the model's spatial awareness. EW-MHSA also lightened origin model and meet the demand of capturing long-distance dependency in the images with 9.7% decrease of the model parameters, smaller than the smallest model in yolov8 series. Through experiments and tests, our model achieves the result of 96.6% mAP, demonstrating its accuracy and potential application in marine ecological monitoring and conservation.