Abstract. This paper presents a new parallel algorithm for nonrigid image registration using off-the-shelf supercomputers, or clusters of PCs. Our algorithm realizes scalable registration for high resolution three-dimensional (3-D) images by employing three techniques: (1) data distribution; (2) data-parallel processing; and (3) dynamic load balancing. The experimental results show that our parallel implementation on a cluster of 64 off-the-shelf PCs (with 128 processors) registers liver CT images of 512×512×159 voxels within 8 minutes while a sequential implementation takes 12 hours. Furthermore, our implementation allows processors to use less memory, and thereby enables us to align 1024×1024×590 voxel images, which is not easy for single processor systems due to the restrictions on the memory space and the processing time.