Radiation-induced xerostomia, as a major problem in radiation treatment of the head and neck cancer, is mainly due to the overdose irradiation injury to the parotid glands. Helical Tomotherapy-based megavoltage computed tomography (MVCT) imaging during the Tomotherapy treatment can be applied to monitor the successive variations in the parotid glands. While manual segmentation is time consuming, laborious, and subjective, automatic segmentation is quite challenging due to the complicated anatomical environment of head and neck as well as noises in MVCT images. In this article, we propose a localization-refinement scheme to segment the parotid gland in MVCT. After data pre-processing we use mask region convolutional neural network (Mask R-CNN) in the localization stage after data pre-processing, and design a modified U-Net in the following fine segmentation stage. To the best of our knowledge, this study is a pioneering work of deep learning on MVCT segmentation. Comprehensive experiments based on different data distribution of head and neck MVCTs and different segmentation models have demonstrated the superiority of our approach in terms of accuracy, effectiveness, flexibility, and practicability. Our method can be adopted as a powerful tool for radiation-induced injury studies, where accurate organ segmentation is crucial.