PurposeTo evaluate the uncertainty of radiomics features from contrast-enhanced breath-hold helical CT scans of non-small cell lung cancer for both manual and semi-automatic segmentation due to intra-observer, inter-observer, and inter-software reliability.MethodsThree radiation oncologists manually delineated lung tumors twice from 10 CT scans using two software tools (3D-Slicer and MIM Maestro). Additionally, three observers without formal clinical training were instructed to use two semi-automatic segmentation tools, Lesion Sizing Toolkit (LSTK) and GrowCut, to delineate the same tumor volumes. The accuracy of the semi-automatic contours was assessed by comparison with physician manual contours using Dice similarity coefficients and Hausdorff distances. Eighty-three radiomics features were calculated for each delineated tumor contour. Informative features were identified based on their dynamic range and correlation to other features. Feature reliability was then evaluated using intra-class correlation coefficients (ICC). Feature range was used to evaluate the uncertainty of the segmentation methods.ResultsFrom the initial set of 83 features, 40 radiomics features were found to be informative, and these 40 features were used in the subsequent analyses. For both intra-observer and inter-observer reliability, LSTK had higher reliability than GrowCut and the two manual segmentation tools. All observers achieved consistently high ICC values when using LSTK, but the ICC value varied greatly for each observer when using GrowCut and the manual segmentation tools. For inter-software reliability, features were not reproducible across the software tools for either manual or semi-automatic segmentation methods. Additionally, no feature category was found to be more reproducible than another feature category. Feature ranges of LSTK contours were smaller than those of manual contours for all features.ConclusionRadiomics features extracted from LSTK contours were highly reliable across and among observers. With semi-automatic segmentation tools, observers without formal clinical training were comparable to physicians in evaluating tumor segmentation.