Semantic segmentation of bone from lower extremity computerized tomography (CT) scans can improve and accelerate the visualization, diagnosis, and surgical planning in orthopaedics. However, the large field of view of these scans makes automatic segmentation using deep learning based methods challenging, slow and graphical processing unit (GPU) memory intensive. We investigated methods to more efficiently represent anatomical context for accurate and fast segmentation and compared these with state-of-the-art methodology. Six lower extremity bones from patients of two different datasets were manually segmented from CT scans, and used to train and optimize a cascaded deep learning approach. We varied the number of resolution levels, receptive fields, patch sizes, and number of V-net blocks. The best performing network used a multi-stage, cascaded V-net approach with 128 3 −64 3 −32 3 voxel patches as input. The average Dice coefficient over all bones was 0.98 ± 0.01, the mean surface distance was 0.26 ± 0.12 mm and the 95th percentile Hausdorff distance 0.65 ± 0.28 mm. This was a significant improvement over the results of the state-of-the-art nnU-net, with only approximately 1/12th of training time, 1/3th of inference time and 1/4th of GPU memory required.Comparison of the morphometric measurements performed on automatic and manual segmentations showed good correlation (Intraclass Correlation Coefficient [ICC] >0.8) for the alpha angle and excellent correlation (ICC >0.95) for the hip-kneeankle angle, femoral inclination, femoral version, acetabular version, Lateral Centre-Edge angle, acetabular coverage. The segmentations were generally of sufficient quality for the tested clinical applications and were performed accurately and quickly compared to state-of-the-art methodology from the literature.