This study aims to develop a CT-based radiomic features analysis approach for diagnosis of ground-glass opacity (GGO) pulmonary nodules, and also assess whether computer-aided diagnosis (CADx) performance changes in classifying between benign and malignant nodules associated with histopathological subtypes namely, adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC), respectively.
The study involves 182 histopathology-confirmed GGO nodules collected from two cancer centers. Among them, 59 are benign, 50 are AIS, 32 are MIA, and 41 are IAC nodules. Four training/testing data sets—(1) all nodules, (2) benign and AIS nodules, (3) benign and MIA nodules, (4) benign and IAC nodules—are assembled based on their histopathological subtypes. We first segment pulmonary nodules depicted in CT images by using a 3D region growing and geodesic active contour level set algorithm. Then, we computed and extracted 1117 quantitative imaging features based on the 3D segmented nodules. After conducting radiomic features normalization process, we apply a leave-one-out cross-validation (LOOCV) method to build models by embedding with a Relief feature selection, synthetic minority oversampling technique (SMOTE) and three machine-learning classifiers namely, support vector machine classifier, logistic regression classifier and Gaussian Naïve Bayes classifier.
When separately using four data sets to train and test three classifiers, the average areas under receiver operating characteristic curves (AUC) are 0.75, 0.55, 0.77 and 0.93, respectively. When testing on an independent data set, our scheme yields higher accuracy than two radiologists (61.3% versus radiologist 1: 53.1% and radiologist 2: 56.3%).
This study demonstrates that: (1) the feasibility of using CT-based radiomic features analysis approach to distinguish between benign and malignant GGO nodules, (2) higher performance of CADx scheme in diagnosing GGO nodules comparing with radiologist, and (3) a consistently positive trend between classification performance and invasive grade of GGO nodules. Thus, to improve the CADx performance in diagnosing of GGO nodules, one should assemble an optimal training data set dominated with more nodules associated with non-invasive lung adenocarcinoma (i.e. AIS and MIA).