Objective. Radiation therapy (RT) represents a prevalent therapeutic modality for head and neck (H&N) cancer. A crucial phase in RT planning involves the precise delineation of organs-at-risks (OARs), employing computed tomography (CT) scans. Nevertheless, the manual delineation of OARs is a labor-intensive process, necessitating individual scrutiny of each CT image slice, not to mention that a standard CT scan comprises hundreds of such slices. Furthermore, there is a significant domain shift between different institutions’ H&N data, which makes traditional semi-supervised learning strategies susceptible to confirmation bias. Therefore, effectively using unlabeled datasets to support annotated datasets for model training has become a critical issue for preventing domain shift and confirmation bias. Approach. In this work, we proposed an innovative Cross-Domain Orthogon-based-Perspective Consistency (CD-OPC) strategy within a two-branch collaborative training framework, which compels the two sub-networks to acquire valuable features from unrelated perspectives. More specifically, a novel generative pretext task Cross-Domain Prediction (CDP) was designed for learning inherent properties of CT images. Then this prior knowledge was utilized to promote the independent learning of distinct features by the two sub-networks from identical inputs, thereby enhancing the perceptual capabilities of the sub-networks through Orthogon-based Pseudo-Labeling Knowledge Transfer (OPKT). Main results. Our CD-OPC model was trained on H&N datasets from nine different institutions, and validated on the four local intuitions’ H&N datasets. Among all datasets CD-OPC achieved more advanced performance than other semi-supervised semantic segmentation algorithms. Significance. The CD-OPC method successfully mitigates domain shift and prevents network collapse. In addition, it enhances the network's perceptual abilities, and generates more reliable predictions, thereby further addressing the confirmation bias issue.