Purpose
Manual delineation on all breathing phases of lung cancer 4D CT image datasets can be challenging, exhaustive, and prone to subjective errors because of both the large number of images in the datasets and variations in the spatial location of tumors secondary to respiratory motion. The purpose of this work is to present a new deep learning‐based framework for fast and accurate segmentation of lung tumors on 4D CT image sets.
Methods
The proposed DL framework leverages motion region convolutional neural network (R‐CNN). Through integration of global and local motion estimation network architectures, the network can learn both major and minor changes caused by tumor motion. Our network design first extracts tumor motion information by feeding 4D CT images with consecutive phases into an integrated backbone network architecture, locating volume‐of‐interest (VOIs) via a regional proposal network and removing irrelevant information via a regional convolutional neural network. Extracted motion information is then advanced into the subsequent global and local motion head network architecture to predict corresponding deformation vector fields (DVFs) and further adjust tumor VOIs. Binary masks of tumors are then segmented within adjusted VOIs via a mask head. A self‐attention strategy is incorporated in the mask head network to remove any noisy features that might impact segmentation performance. We performed two sets of experiments. In the first experiment, a five‐fold cross‐validation on 20 4D CT datasets, each consisting of 10 breathing phases (i.e., 200 3D image volumes in total). The network performance was also evaluated on an additional unseen 200 3D images volumes from 20 hold‐out 4D CT datasets. In the second experiment, we trained another model with 40 patients’ 4D CT datasets from experiment 1 and evaluated on additional unseen nine patients’ 4D CT datasets. The Dice similarity coefficient (DSC), center of mass distance (CMD), 95th percentile Hausdorff distance (HD95), mean surface distance (MSD), and volume difference (VD) between the manual and segmented tumor contour were computed to evaluate tumor detection and segmentation accuracy. The performance of our method was quantitatively evaluated against four different methods (VoxelMorph, U‐Net, network without global and local networks, and network without attention gate strategy) across all evaluation metrics through a paired t‐test.
Results
The proposed fully automated DL method yielded good overall agreement with the ground truth for contoured tumor volume and segmentation accuracy. Our model yielded significantly better values of evaluation metrics (p < 0.05) than all four competing methods in both experiments. On hold‐out datasets of experiment 1 and 2, our method yielded DSC of 0.86 and 0.90 compared to 0.82 and 0.87, 0.75 and 0.83, 081 and 0.89, and 0.81 and 0.89 yielded by VoxelMorph, U‐Net, network without global and local networks, and networks without attention gate strategy. Tumor VD between ground truth and our method was the smal...