The detection of road potholes plays a crucial role in ensuring passenger comfort and the structural safety of vehicles. To address the challenges of pothole detection in complex road environments, this paper proposes a model focusing on shape features (pothole detection you only look once, PD‐YOLO). The model aims to overcome the limitations of multi‐scale feature learning caused by the use of fixed convolutional kernels in the baseline model, by constructing a feature extraction module that better adapts to variations in the shape of potholes. Subsequently, a cross‐stage partial network was designed using a one‐time aggregation method, simplifying the model while enabling the network to fuse information between feature maps at different stages. Additionally, a dynamic sparse attention mechanism is introduced to select relevant features, reducing redundancy and suppressing background noise. Experiments conducted on the VOC2007 and GRDDC2020_Pothole datasets reveal that compared to the baseline model YOLOv8, PD‐YOLO achieves improvements of 3.9% and 2.8% in mean average precision, with a frame rate of approximately 290 frames per second, effectively meeting the accuracy and real‐time requirements for pothole detection. The code and dataset for this paper are located at: https://github.com/woyijiankou/PD‐YOLO.