This paper presents an efficient weed detection method based on the latent diffusion transformer, aimed at enhancing the accuracy and applicability of agricultural image analysis. The experimental results demonstrate that the proposed model achieves a precision of 0.92, a recall of 0.89, an accuracy of 0.91, a mean average precision (mAP) of 0.91, and an F1 score of 0.90, indicating its outstanding performance in complex scenarios. Additionally, ablation experiments reveal that the latent-space-based diffusion subnetwork outperforms traditional models, such as the the residual diffusion network, which has a precision of only 0.75. By combining latent space feature extraction with self-attention mechanisms, the constructed lightweight model can respond quickly on mobile devices, showcasing the significant potential of deep learning technologies in agricultural applications. Future research will focus on data diversity and model interpretability to further enhance the model’s adaptability and user trust.