Mode-division multiplexing (MDM) is seen as a possible solution to satisfy the rising capacity demands of optical communication networks. To make MDM a success, fibers supporting the propagation of a huge number of modes are of interest. Many of the system aspects occurring during the propagation can be evaluated by using appropriate models. However, fibers are a nonlinear medium and, therefore, numerical simulations are required. For a large number of modes, the simulation of the nonlinear signal propagation leads to new challenges, for example regarding the required memory, which we address with an implementation incorporating multiple GPU-accelerators. Within this paper, we evaluate two different approaches to realize the communication between the GPUs and analyze the performance for simulations involving up to 8 Tesla GPUs. We show results for a MDM transmission system utilizing the extremely large but practically very relevant number of 120 spatial modes as an application example and analyze the impact of the nonlinear effects on the transmitted signals.