The number of rice seedlings and their spatial distribution are the main agronomic components for determining rice yield. However, the above agronomic information is manually obtained through visual inspection, which is not only labor-intensive and time-consuming but also low in accuracy. To address these issues, this paper proposes RS-P2PNet, which automatically counts and locates rice seedlings through point supervision. Specifically, RS-P2PNet first adopts Resnet as its backbone and introduces mixed local channel attention (MLCA) in each stage. This allows the model to pay attention to the task-related feature in the spatial and channel dimensions and avoid interference from the background. In addition, a multi-scale feature fusion module (MSFF) is proposed by adding different levels of features from the backbone. It combines the shallow details and high-order semantic information of rice seedlings, which can improve the positioning accuracy of the model. Finally, two rice seedling datasets, UERD15 and UERD25, with different resolutions, are constructed to verify the performance of RS-P2PNet. The experimental results show that the MAE values of RS-P2PNet reach 1.60 and 2.43 in the counting task, and compared to P2PNet, they are reduced by 30.43% and 9.32%, respectively. In the localization task, the Recall rates of RS-P2PNet reach 97.50% and 96.67%, exceeding those of P2PNet by 1.55% and 1.17%, respectively. Therefore, RS-P2PNet has effectively accomplished the counting and localization of rice seedlings. In addition, the MAE and RMSE of RS-P2PNet on the public dataset DRPD reach 1.7 and 2.2, respectively, demonstrating good generalization.