Parallel performance for GPUs today surpasses the traditional multi-core CPUs. Currently, many researchers started to test several AI algorithms on GPUs instead of CPUs, especially after the release of libraries such as CUDA and OpenCL that allows the implementation of general algorithms on the GPU. One of the most famous game tree search algorithms is Negamax, which tries to find the optimal next move for zero sum games. In this research, an implementation of an enhanced parallel NegaMax algorithm is presented, that runs on GPU using CUDA library. The enhanced algorithms use techniques such as no divergence, dynamic parallelism and shared GPU table. The approach was tested in checkers and chess games. It was compared with previous studies, including threads on CPU for up to 6x speedup for an 8 core processor and threads on GPU using iterative dependence and fixed grid and block size for up to 40x speedup at 14 depth. Furthermore, the approach was tested with different depths on the CPU and the GPU. The result shows speed up for parallel GPU up to 80x at 14 depth for checkers game and 90x at 14 depth for chess game, which doubled the previous research results.