The purpose of this study was to compare the performances of 2D, 2.5D, and 3D CNN-based segmentation networks, along with a 3D vision transformer-based segmentation network, for segmenting mandibular canals (MCs) on the public and external CBCT datasets under the same GPU memory capacity. We also performed ablation studies for an image-cropping (IC) technique and segmentation loss functions. 3D-UNet showed the highest segmentation performance for the MC than those of 2D and 2.5D segmentation networks on public test datasets, achieving 0.569 ± 0.107, 0.719 ± 0.092, 0.664 ± 0.131, and 0.812 ± 0.095 in terms of JI, DSC, PR, and RC, respectively. On the external test dataset, 3D-UNet achieved 0.564 ± 0.092, 0.716 ± 0.081, 0.812 ± 0.087, and 0.652 ± 0.103 in terms of JI, DSC, PR, and RC, respectively. The IC technique and multi-planar Dice loss improved the boundary details and structural connectivity of the MC from the mental foramen to the mandibular foramen. The 3D-UNet demonstrated superior segmentation performance for the MC by learning 3D volumetric context information for the entire MC in the CBCT volume.