This paper proposes large-scale transient stability simulation based on the massively parallel architecture of multiple graphics processing units (GPUs). A robust and efficient instantaneous relaxation based parallel processing technique which features implicit integration, full Newton iteration, and sparse LU based linear solver is used to run the multiple GPUs simultaneously. This implementation highlights the combination of coarse-grained algorithm-level parallelism with fine-grained data-parallelism of the GPUs to accelerate large-scale transient stability simulation. Multi-threaded parallel programming makes the entire implementation highly transparent, scalable and efficient. Several large test systems are used for the simulation with a maximum size of 9984 buses and 2560 synchronous generators all modeled in detail resulting in matrices that are larger than 20000×20000.Index Terms-Graphics processors, instantaneous relaxation, large-scale systems, multiple GPUs, newton-raphson method, parallel multi-threaded programming, power system simulation, power system transient stability, sparse direct solvers.