Study Design.
This is a retrospective, cross-sectional, population-based study that automatically measured the Facet Joint angles from T2-weighted axial MRIs of the Lumbar Spine using deep learning.
Objective.
This work aimed to introduce a semi-automatic framework that measures the Facet Joint (FJ) angles using deep learning and study Facet Tropism (FT) in a large Finnish population-based cohort.
Summary of Data.
T2-weighted axial MRIs of the lumbar spine (L3/4 through L5/S1) for (n=1288) in the NFBC1966 Finnish population-based cohort were used for this study.
Materials and Methods.
A deep learning model was developed and trained on 430 participants’ MRI images. We computed FJ angles from the model’s prediction for each level, i.e., L3/4 through L5/S1, for the male and female subgroups. Inter and intra-rater reliability was analyzed for 60 participants using annotations made by two radiologists and a musculoskeletal researcher. With the developed method, we examined FT in the entire NFBC1966 cohort, adopting the literature definitions of FT thresholds at 7° and 10°. The rater agreement was evaluated both for the annotations and the FJ angles computed based on the annotations. FJ asymmetry (
-
was used to evaluate the agreement and correlation between the raters. Bland-Altman (BA) analysis was used to assess the agreement and systemic bias in the FJ asymmetry. We used the Dice score as the metric to compare the annotations between the raters. We evaluated the model predictions on the independent test set and compared them against the ground truth annotations.
Results.
Our model scored Dice (92.7±0.1) and IoU (87.1±0.2) aggregated across all the regions of interest, i.e., Vertebral Body (VB), Facet Joints (FJ) and Posterior Arch (PA). The mean FJ angles measured for the male and female sub-groups were in agreement with the literature findings. Intra-rater reliability was high, with a Dice score of VB (97.3), FJ (82.5), and PA (90.3). The inter-rater reliability was better between the radiologists with a Dice score of VB (96.4), FJ (75.5), and PA (85.8) than between the radiologists and the musculoskeletal researcher. Prevalence of FT was higher in the male subgroup, with L4/5 found to be the most affected region.
Conclusion.
We developed a deep learning-based framework that enabled us to study FT in a large cohort. Using the proposed method, we present the prevalence of FT in a Finnish population-based cohort.