As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM due to its very fine mesh needed to achieve a sufficiently high accuracy. A new multiple‐GPU parallel strategy is developed based on a single‐root complex architecture of the computer purely within a CUDA environment. Peer‐to‐Peer (P2P) communication between the GPUs is performed to exchange the information of the crossing particles and ghost element nodes, which is faster than the heavy send/receive operations between different computers through the infiniBand network. Domain decomposition is performed to split the whole computational task over the GPUs with a number of subdomains. The computations within each subdomain are allocated on a corresponding GPU using an enhanced “Particle‐List” scheme to tackle the data race during the interpolation from associated particles to common nodes. The acceleration effect of the parallelization is evaluated with two benchmarks cases, mini‐slump test after a dam break and cone penetration test in clay, where the maximum speedups with 1 and 8 GPUs are 88 and 604, respectively.