Accurate annotation of vertebral bodies is crucial for automating the analysis of spinal X-ray images. However, manual annotation of these structures is a laborious and costly process due to their complex nature, including small sizes and varying shapes. To address this challenge and expedite the annotation process, we propose an ensemble pipeline called VertXNet. This pipeline currently combines two segmentation mechanisms, semantic segmentation using U-Net, and instance segmentation using Mask R-CNN, to automatically segment and label vertebral bodies in lateral cervical and lumbar spinal X-ray images. VertXNet enhances its effectiveness by adopting a rule-based strategy (termed the ensemble rule) for effectively combining segmentation outcomes from U-Net and Mask R-CNN. It determines vertebral body labels by recognizing specific reference vertebral instances, such as cervical vertebra 2 (‘C2’) in cervical spine X-rays and sacral vertebra 1 (‘S1’) in lumbar spine X-rays. Those references are commonly relatively easy to identify at the edge of the spine. To assess the performance of our proposed pipeline, we conducted evaluations on three spinal X-ray datasets, including two in-house datasets and one publicly available dataset. The ground truth annotations were provided by radiologists for comparison. Our experimental results have shown that the proposed pipeline outperformed two state-of-the-art (SOTA) segmentation models on our test dataset with a mean Dice of 0.90, vs. a mean Dice of 0.73 for Mask R-CNN and 0.72 for U-Net. We also demonstrated that VertXNet is a modular pipeline that enables using other SOTA model, like nnU-Net to further improve its performance. Furthermore, to evaluate the generalization ability of VertXNet on spinal X-rays, we directly tested the pre-trained pipeline on two additional datasets. A consistently strong performance was observed, with mean Dice coefficients of 0.89 and 0.88, respectively. In summary, VertXNet demonstrated significantly improved performance in vertebral body segmentation and labeling for spinal X-ray imaging. Its robustness and generalization were presented through the evaluation of both in-house clinical trial data and publicly available datasets.