Objective: 
The validation of deformable image registration (DIR) for contour propagation is often done using contour-based metrics. Meanwhile, dose accumulation requires evaluation of voxel mapping accuracy, which might not be accurately represented by contour-based metrics. By fabricating a deformable anthropomorphic pelvis phantom, we aim to (1) quantify the voxel mapping accuracy for various deformation scenarios, in high- and low-contrast regions, and (2) identify any correlation between Dice similarity coefficient (DSC), a commonly used contour-based metric, and the voxel mapping accuracy for each organ.
Approach: 
Four organs, i.e., pelvic bone, prostate, bladder and rectum, were 3D printed using PLA and a Polyjet digital material, and assembled. The latter three were implanted with glass bead and CT markers within or on their surfaces. Four deformation scenarios were simulated by varying the bladder and rectum volumes. For each scenario, nine DIRs with different parameters were performed on RayStation v10B. The voxel mapping accuracy was quantified by finding the discrepancy between true and mapped marker positions, termed the target registration error (TRE). Pearson correlation test was done between the DSC and mean TRE for each organ. 
Main results: 
For the first time, we fabricated a deformable phantom purely from 3D printing, which successfully reproduced realistic anatomical deformations. Overall, the voxel mapping accuracy dropped with increasing deformation magnitude, but improved when more organs were used to guide the DIR or limit the registration region. DSC was found to be a good indicator of voxel mapping accuracy for prostate and rectum, but a comparatively poorer one for bladder. DSC>0.85/0.90 was established as the threshold of mean TRE≤0.3 cm for rectum/prostate. For bladder, extra metrics in addition to DSC should be considered. 
Significance: 
This work presented a 3D printed phantom, which enabled quantification of voxel mapping accuracy and evaluation of correlation between DSC and voxel mapping accuracy.