Objectives
Although magnetic resonance imaging–based formalized grading schemes for intervertebral disc degeneration offer improved reproducibility compared with purely subjective ratings, their intrarater and interrater reliability are not nearly good enough to be able to detect small to medium effects in clinical longitudinal studies. The aim of this study thus was to develop a method that enables automatic and therefore reproducible and reliable evaluation of disc degeneration based on conventional clinical image data and Pfirrmann's grading scheme.
Materials and Methods
We propose a classifier based on a deep convolutional neural network that we trained on a large, manually evaluated data set of 1599 patients (7948 intervertebral discs). To improve upon the status quo, we focused on the quality of the training data and performed extensive hyperparameter optimization. We assessed the potential benefits of optimizing loss functions beyond common cross-entropy loss, such as soft kappa loss, ordinal cross-entropy loss, or regression losses. We furthermore experimented with ways to mitigate class imbalance by pooling classes or using class-weighted loss functions. During model development and hyperparameter optimization, we used a fixed 90%/10% training/validation set split. To estimate real-world prediction performance, we performed 10-fold cross-validation.
Results
The evaluated image data results in a Gaussian degeneration grade distribution, and thus grades 1 and 5 are slightly underrepresented in the training set. Our default cross-entropy–based classifier achieves a reliability of κ = 0.92 (Cohen κ), an average sensitivity of 90.2%, and an average precision of 92.5%. In 99.2% of validation cases, the network's prediction deviates at most 1 Pfirrmann grades from the ground truth. Framed as an ordinal regression problem, the mean absolute error between the ground truth and the prediction is 0.08 Pfirrmann grade with a correlation of r = 0.96. The results of the 10-fold cross validation confirm those performance estimates, indicating no substantial overfitting. More sophisticated loss functions, class-based loss weighting, or class pooling did not lead to improved classification performance overall.
Conclusions
With a reliability of κ > 0.9, our system clearly outperforms average human interrater as well as intrarater reliability. With an average sensitivity of more than 90%, our classifier also surpasses state-of-the-art machine learning solutions for automatically grading disc degeneration.
Spinal lesion differential diagnosis remains challenging even in MRI. Radiomics and machine learning (ML) have proven useful even in absence of a standardized data mining pipeline. We aimed to assess ML diagnostic performance in spinal lesion differential diagnosis, employing radiomic data extracted by different software. Methods: Patients undergoing MRI for a vertebral lesion were retrospectively analyzed (n = 146, 67 males, 79 females; mean age 63 ± 16 years, range 8-89 years) and constituted the train (n = 100) and internal test cohorts (n = 46). Part of the latter had additional prior exams which constituted a multi-scanner, external test cohort (n = 35). Lesions were labeled as benign or malignant (2-label classification), and benign, primary malignant or metastases (3-label classification) for classification analyses. Features extracted via 3D Slicer heterogeneityCAD module (hCAD) and PyRadiomics were independently used to compare different combinations of feature selection methods and ML classifiers (n = 19). Results: In total, 90 and 1548 features were extracted by hCAD and PyRadiomics, respectively. The best feature selection method-ML algorithm combination was selected by 10 iterations of 10-fold cross-validation in the training data. For the 2-label classification ML obtained 94% accuracy in the internal test cohort, using hCAD data, and 86% in the external one. For the 3-label classification, PyRadiomics data allowed for 80% and 69% accuracy in the internal and external test sets, respectively. Conclusions: MRI radiomics combined with ML may be useful in spinal lesion assessment. More robust preprocessing led to better consistency despite scanner and protocol heterogeneity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.