Purpose
We propose a systematic methodology to quantify incidentally identified pulmonary nodules based on observed radiological traits (semantics) quantified on a point scale and a machine learning method using these data to predict cancer status.
Materials and Methods
We investigated 172 patients who had low-dose computed tomography (LDCT) images, with 102 and 70 patients grouped into training and validation cohorts, respectively. On the images, 24 radiological traits were systematically scored and a linear classifier was built to relate the traits to malignant status. The model was formed both with and without size descriptors to remove bias due to nodule size. The multivariate pairs formed on the training set was tested on an independent validation data set to evaluate its performance.
Results
The best four feature set that included a size measurement (Set 1), was short axis, contour, concavity, and texture, which had an area under the receiver operator characteristic curve (AUROC) of 0.88 (Accuracy= 81%, Sensitivity= 76.2%, Specificity= 91.7%). If size measures were excluded, the four best features (Set 2) were: location, fissure attachment, lobulation, and spiculation which had an AUROC of 0.83 (Accuracy= 73.2%, Sensitivity= 73.8%, Specificity= 81.7%) in predicting malignancy in primary nodules. The validation test AUROC was 0.8 (Accuracy=74.3%, Sensitivity =66.7%, Specificity= 75.6%) and 0.74 (Accuracy=71.4%, Sensitivity = 61.9%, Specificity = 75.5%) for Sets 1 and 2, respectively.
Conclusions
Radiological image traits are useful in predicting malignancy in lung nodules. These semantic traits can be used in combination with size-based measures to enhance prediction accuracy and reducing false positives.