Objectives
To develop a deep learning methodology that distinguishes early from late stages of avascular necrosis of the hip (AVN) to determine treatment decisions.
Methods
Three convolutional neural networks (CNNs) VGG-16, Inception ResnetV2, InceptionV3 were trained with transfer learning (ImageNet) and finetuned with a retrospectively collected cohort of (n = 104) MRI examinations of AVN patients, to differentiate between early (ARCO 1–2) and late (ARCO 3–4) stages. A consensus CNN ensemble decision was recorded as the agreement of at least two CNNs. CNN and ensemble performance was benchmarked on an independent cohort of 49 patients from another country and was compared to the performance of two MSK radiologists. CNN performance was expressed with areas under the curve (AUC), the respective 95% confidence intervals (CIs) and precision, and recall and f1-scores. AUCs were compared with DeLong’s test.
Results
On internal testing, Inception-ResnetV2 achieved the highest individual performance with an AUC of 99.7% (95%CI 99–100%), followed by InceptionV3 and VGG-16 with AUCs of 99.3% (95%CI 98.4–100%) and 97.3% (95%CI 95.5–99.2%) respectively. The CNN ensemble the same AUCs Inception ResnetV2. On external validation, model performance dropped with VGG-16 achieving the highest individual AUC of 78.9% (95%CI 51.6–79.6%) The best external performance was achieved by the model ensemble with an AUC of 85.5% (95%CI 72.2–93.9%). No significant difference was found between the CNN ensemble and expert MSK radiologists (p = 0.22 and 0.092 respectively).
Conclusion
An externally validated CNN ensemble accurately distinguishes between the early and late stages of AVN and has comparable performance to expert MSK radiologists.
Clinical relevance statement
This paper introduces the use of deep learning for the differentiation between early and late avascular necrosis of the hip, assisting in a complex clinical decision that can determine the choice between conservative and surgical treatment.
Key Points
• A convolutional neural network ensemble achieved excellent performance in distinguishing between early and late avascular necrosis.
• The performance of the deep learning method was similar to the performance of expert readers.