BackgroundPET/CT images combining anatomic and metabolic data provide complementary information that can improve clinical task performance. PET image segmentation algorithms exploiting the multi‐modal information available are still lacking.PurposeOur study aimed to assess the performance of PET and CT image fusion for gross tumor volume (GTV) segmentations of head and neck cancers (HNCs) utilizing conventional, deep learning (DL), and output‐level voting‐based fusions.MethodsThe current study is based on a total of 328 histologically confirmed HNCs from six different centers. The images were automatically cropped to a 200 × 200 head and neck region box, and CT and PET images were normalized for further processing. Eighteen conventional image‐level fusions were implemented. In addition, a modified U2‐Net architecture as DL fusion model baseline was used. Three different input, layer, and decision‐level information fusions were used. Simultaneous truth and performance level estimation (STAPLE) and majority voting to merge different segmentation outputs (from PET and image‐level and network‐level fusions), that is, output‐level information fusion (voting‐based fusions) were employed. Different networks were trained in a 2D manner with a batch size of 64. Twenty percent of the dataset with stratification concerning the centers (20% in each center) were used for final result reporting. Different standard segmentation metrics and conventional PET metrics, such as SUV, were calculated.ResultsIn single modalities, PET had a reasonable performance with a Dice score of 0.77 ± 0.09, while CT did not perform acceptably and reached a Dice score of only 0.38 ± 0.22. Conventional fusion algorithms obtained a Dice score range of [0.76–0.81] with guided‐filter‐based context enhancement (GFCE) at the low‐end, and anisotropic diffusion and Karhunen–Loeve transform fusion (ADF), multi‐resolution singular value decomposition (MSVD), and multi‐level image decomposition based on latent low‐rank representation (MDLatLRR) at the high‐end. All DL fusion models achieved Dice scores of 0.80. Output‐level voting‐based models outperformed all other models, achieving superior results with a Dice score of 0.84 for Majority_ImgFus, Majority_All, and Majority_Fast. A mean error of almost zero was achieved for all fusions using SUVpeak, SUVmean and SUVmedian.ConclusionPET/CT information fusion adds significant value to segmentation tasks, considerably outperforming PET‐only and CT‐only methods. In addition, both conventional image‐level and DL fusions achieve competitive results. Meanwhile, output‐level voting‐based fusion using majority voting of several algorithms results in statistically significant improvements in the segmentation of HNC.