In this paper, we propose a new technique for the segmentation of the Air-Tissue Boundaries (ATBs) in the vocal tract from the real-time magnetic resonance imaging (rtMRI) videos of the upper airway in the midsagittal plane. The proposed technique uses the approach of semantic segmentation using the Deep learning architecture called Fully Convolutional Networks (FCN). The architecture takes an input image and produces images of the same size with air and tissue class labels at each pixel. These output images are post processed using morphological filling and image smoothing to predict realistic ATBs. The performance of the predicted contours is evaluated using Dynamic Time Warping (DTW) distance between the manually annotated ground truth contours and the predicted contours. Four fold experiments with four subjects from USC-TIMIT corpus (with ∼2900 training images in every fold) demonstrate that the proposed FCN based approach has 8.87% and 9.65% lesser average error than the baseline Maeda Grid based scheme, for the lower and upper ATBs respectively. In addition, the proposed FCN based rtMRI segmentation achieves an average pixel classification accuracy of 99.05% across all subjects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.