Computer aided modeling of anatomic deformation, allowing various techniques and protocols in radiation therapy to be systematically verified and studied, has become increasingly attractive. In this study the potential issues in deformable image registration ͑DIR͒ were analyzed based on two numerical phantoms: One, a synthesized, low intensity gradient prostate image, and the other a lung patient's CT image data set. Each phantom was modeled with region-specific material parameters with its deformation solved using a finite element method. The resultant displacements were used to construct a benchmark to quantify the displacement errors of the Demons and B-Spline-based registrations. The results show that the accuracy of these registration algorithms depends on the chosen parameters, the selection of which is closely associated with the intensity gradients of the underlying images. For the Demons algorithm, both single resolution ͑SR͒ and multiresolution ͑MR͒ registrations required approximately 300 iterations to reach an accuracy of 1.4 mm mean error in the lung patient's CT image ͑and 0.7 mm mean error averaged in the lung only͒. For the low gradient prostate phantom, these algorithms ͑both SR and MR͒ required at least 1600 iterations to reduce their mean errors to 2 mm. For the B-Spline algorithms, best performance ͑mean errors of 1.9 mm for SR and 1.6 mm for MR, respectively͒ on the low gradient prostate was achieved using five grid nodes in each direction. Adding more grid nodes resulted in larger errors. For the lung patient's CT data set, the B-Spline registrations required ten grid nodes in each direction for highest accuracy ͑1.4 mm for SR and 1.5 mm for MR͒. The numbers of iterations or grid nodes required for optimal registrations depended on the intensity gradients of the underlying images. In summary, the performance of the Demons and B-Spline registrations have been quantitatively evaluated using numerical phantoms. The results show that parameter selection for optimal accuracy is closely related to the intensity gradients of the underlying images. Also, the result that the DIR algorithms produce much lower errors in heterogeneous lung regions relative to homogeneous ͑low intensity gradient͒ regions, suggests that feature-based evaluation of deformable image registration accuracy must be viewed cautiously.