BackgroundFunctional assessment of right ventricle (RV) using gated myocardial perfusion single‐photon emission computed tomography (MPS) heavily relies on the precise extraction of right ventricular contours.PurposeIn this paper, we present a new deep‐learning‐based model integrating both the spatial and temporal features in gated MPS images to perform the segmentation of the RV epicardium and endocardium.MethodsBy integrating the spatial features from each cardiac frame of the gated MPS and the temporal features from the sequential cardiac frames of the gated MPS, we developed a Spatial‐Temporal V‐Net (ST‐VNet) for automatic extraction of RV endocardial and epicardial contours. In the ST‐VNet, a V‐Net is employed to hierarchically extract spatial features, and convolutional long‐term short‐term memory (ConvLSTM) units are added to the skip‐connection pathway to extract the temporal features. The input of the ST‐VNet is ECG‐gated sequential frames of the MPS images and the output is the probability map of the epicardial or endocardial masks. A Dice similarity coefficient (DSC) loss which penalizes the discrepancy between the model prediction and the manual annotation was adopted to optimize the segmentation model.ResultsOur segmentation model was trained and validated on a retrospective dataset with 45 subjects, and the cardiac cycle of each subject was divided into eight gates. The proposed ST‐VNet achieved a DSC of 0.8914 and 0.8157 for the RV epicardium and endocardium segmentation, respectively. The mean absolute error, the mean squared error, and the Pearson correlation coefficient of the RV ejection fraction (RVEF) between the manual annotation and the model prediction were 0.0609, 0.0830, and 0.6985.ConclusionOur proposed ST‐VNet is an effective model for RV segmentation. It has great promise for clinical use in RV functional assessment.