BACKGROUND: Medical image segmentation is crucial in disease diagnosis and treatment planning. Deep learning (DL) techniques have shown promise. However, optimizing DL models requires setting numerous parameters, and demands substantial labeled datasets, which are labor-intensive to create. OBJECTIVE: This study proposes a semi-supervised model that can utilize labeled and unlabeled data to accurately segment kidneys, tumors, and cysts on CT images, even with limited labeled samples. METHODS: An end-to-end semi-supervised learning model named MTAN (Mean Teacher Attention N-Net) is designed to segment kidneys, tumors, and cysts on CT images. The MTAN model is built on the foundation of the AN-Net architecture, functioning dually as teachers and students. In its student role, AN-Net learns conventionally. In its teacher role, it generates objects and instructs the student model on their utilization to enhance learning quality. The semi-supervised nature of MTAN allows it to effectively utilize unlabeled data for training, thus improving performance and reducing overfitting. RESULTS: We evaluate the proposed model using two CT image datasets (KiTS19 and KiTS21). In the KiTS19 dataset, MTAN achieved segmentation results with an average Dice score of 0.975 for kidneys and 0.869 for tumors, respectively. Moreover, on the KiTS21 dataset, MTAN demonstrates its robustness, yielding average Dice scores of 0.977 for kidneys, 0.886 for masses, 0.861 for tumors, and 0.759 for cysts, respectively. CONCLUSION: The proposed MTAN model presents a compelling solution for accurate medical image segmentation, particularly in scenarios where the labeled data is scarce. By effectively utilizing the unlabeled data through a semi-supervised learning approach, MTAN mitigates overfitting concerns and achieves high-quality segmentation results. The consistent performance across two distinct datasets, KiTS19 and KiTS21, underscores model’s reliability and potential for clinical reference.