BackgroundCurrently, manual measurement of lumbosacral radiological parameters is time-consuming and laborious, and inevitably produces considerable variability. This study aimed to develop and evaluate a deep learning-based model for automatically measuring lumbosacral radiographic parameters on lateral lumbar radiographs.MethodsWe retrospectively collected 1,240 lateral lumbar radiographs to train the model. The included images were randomly divided into training, validation, and test sets in a ratio of approximately 8:1:1 for model training, fine-tuning, and performance evaluation, respectively. The parameters measured in this study were lumbar lordosis (LL), sacral horizontal angle (SHA), intervertebral space angle (ISA) at L4–L5 and L5–S1 segments, and the percentage of lumbar spondylolisthesis (PLS) at L4–L5 and L5–S1 segments. The model identified key points using image segmentation results and calculated measurements. The average results of key points annotated by the three spine surgeons were used as the reference standard. The model’s performance was evaluated using the percentage of correct key points (PCK), intra-class correlation coefficient (ICC), Pearson correlation coefficient (r), mean absolute error (MAE), root mean square error (RMSE), and box plots.ResultsThe model’s mean differences from the reference standard for LL, SHA, ISA (L4–L5), ISA (L5–S1), PLS (L4–L5), and PLS (L5–S1) were 1.69°, 1.36°, 1.55°, 1.90°, 1.60%, and 2.43%, respectively. When compared with the reference standard, the measurements of the model had better correlation and consistency (LL, SHA, and ISA: ICC = 0.91–0.97, r = 0.91–0.96, MAE = 1.89–2.47, RMSE = 2.32–3.12; PLS: ICC = 0.90–0.92, r = 0.90–0.91, MAE = 1.95–2.93, RMSE = 2.52–3.70), and the differences between them were not statistically significant (p > 0.05).ConclusionThe model developed in this study could correctly identify key vertebral points on lateral lumbar radiographs and automatically calculate lumbosacral radiographic parameters. The measurement results of the model had good consistency and reliability compared to manual measurements. With additional training and optimization, this technology holds promise for future measurements in clinical practice and analysis of large datasets.