An efficient graphics processing units (GPUs) version of time-dependent wavepacket code is developed for the atom−diatom state-to-state reactive scattering processes. The propagation of the wavepacket is entirely calculated on GPUs employing the split-operator method after preparation of the initial wavepacket on the central processing unit (CPU). An additional split-operator method is introduced in the rotational part of the Hamiltonian to decrease communication of GPUs without losing accuracy of state-to-state information. The code is tested to calculate the differential cross sections of H + H 2 reaction and state-resolved reaction probabilities of nonadiabatic triplet−singlet transitions of O( 3 P, 1 D) + H 2 for the total angular momentum J = 0. The global speedups of 22.11, 38.80, and 44.80 are found comparing the parallel computation of one GPU, two GPUs by exact rotational operator, and two GPU versions by an approximate rotational operator with serial computation of the CPU, respectively.
INTRODUCTIONA state-to-state quantum reaction dynamics study provides the most detailed observables and offers profound insight of chemical reaction process. In the past decades, several approaches for quantum reaction dynamics have been developed. With hyperspherical coordinate, the time-independent (TID) coupled-channel method has been employed to study many triatomic systems. 1,2 On the other hand, the timedependent method has been wildly used and implemented by different schemes because it does not scale as a cube of the number of basis functions. The coordinates can be chosen as reactant Jacobi, product Jacobi, or hyperspherical coordinates. 3−5 The wavepacket can be real or complex 6,7 and be propagated by the split-operator method, 8,9 Chebyshev polynomial expansion, finite difference approaches, and so forth. 10,11 While the calculations of differential cross sections (DCSs) have been reported for many triatomic reactions and two tetraatomic systems up to now, 12 theoretical calculations are still a great challenge of computational time consumption. Thus, the parallelism on central processing units (CPUs) has become a routine for state-resolved quantum dynamics. Because largescale parallelism with hundreds of computational nodes is still expensive, parallelism on graphics processing units (GPUs) provides a good alternative for being able to access hundreds of cores in the single GPU. The GPU has been widely used in many traditional computational scientific areas. As to chemistry science, one GPU provides hundreds of times the performance of a single CPU core in molecular dynamics and electronic structure calculations. 13,14 Recently, the CPUs/GPUs implemented in the TID method show a 6.98 speedup obtained by three GPUs and three CPU cores on computational efficiency. 15 Pacifici et al. have reported a GPU implementation for reactive scattering through the evaluation of the reactive