Abstract-GPU(Graphic Processing Unit) provides a promising solution with massive threads and its advantage is high performance computing. The emergence of CUDA(Compute Unified Device Architecture) opens the door of using GPU's powerful computing power. However, because of the limitation of CUDA itself, direct communication is not supported between SMs(streaming multiprocessors) on GPU. It is time-consuming by atomic operation or barrier synchronization. A synchronization mechanism has been proposed in this paper, that is, on the premise of result available, the times of kernel launched should be reduced. Each kernel launched, it should be computed enough on GPU, the results back to the CPU. Based on SSSP, the validity of this method is illustrated by delta-stepping. For facebook dataset, compared with atomic operation, the speedup ratio is about 1.8. For New York map dataset, compared with atomic operation and barrier synchronization, the speedup ratio is about 9.3 and 1.7 separately.