Optical testing is constantly evolving, necessitating higher lateral resolution in interferometry. Achieving high resolution leads to longer processing times, significantly impacting testing efficiency. The unwrapping phase algorithm is crucial in interferometry, but its complex calculations can impede efficiency improvements. There are two types of algorithms for the unwrapping phase: path-dependent and path-independent. Path-dependent algorithms tend to be more efficient, and thus, we have chosen to utilize the accelerated path-dependent algorithm. Among these algorithms, Goldstein's algorithm is widely applied. This study uses CPU-GPU heterogeneous computing to parallelize and accelerate the Goldstein phase unwrapping algorithm while maintaining acceptable numerical error limits. Our proposal focuses on optimizing the serial Goldstein algorithm for GPU architectures by parallelizing and enhancing three key steps: residue identification, branch cutting, and integration. Specifically, our optimization approach leverages GPU shared memory and SIMD functionality. To assess the efficiency of our proposed method, we conducted tests on the unwrapped phase image with varying pixel sizes. The results demonstrate that as the pixel size increases, the performance gain from GPU computation becomes more pronounced compared to CPU computation. Using a 4096×4096 phase diagram on the RTX3070 laptop hardware, we achieved a 60x speed increase in the overall process compared to the CPU version. Therefore, employing this algorithm with the GPU can significantly expedite the phase unwrapping process and enhance the efficiency of interferometry.