a b s t r a c tIn this paper, we propose an efficient parallel dynamic linear solver, called GPU-GMRES, for transient analysis of large linear dynamic systems such as large power grid networks. The new method is based on the preconditioned generalized minimum residual (GMRES) iterative method implemented on heterogeneous CPU-GPU platforms. The new solver is very robust and can be applied to power grids with different structures as well as for general analysis problems for large linear dynamic systems with asymmetric matrices. The proposed GPU-GMRES solver adopts the very general and robust incomplete LU based preconditioner. We show that by properly selecting the right amount of fill-ins in the incomplete LU factors, a good trade-off between GPU efficiency and convergence rate can be achieved for the best overall performance. Such tunable feature can make this algorithm very adaptive to different problems. GPU-GMRES solver properly partitions the major computing tasks in GMRES solver to minimize the data traffic between CPU and GPUs to enhance performance of the proposed method. Furthermore, we propose a new fast parallel sparse matrix-vector (SpMV) multiplication algorithm to further accelerate the GPU-GMRES solver. The new algorithm, called segSpMV, can enjoy full coalesced memory access compared to existing approaches. To further improve the scalability and efficiency, segSpMV method is further extended to multi-GPU platforms, which leads to more scalable and faster multi-GPU GMRES solver. Experimental results on the set of the published IBM benchmark circuits and mesh-structured power grid networks show that the GPU-GMRES solver can deliver order of magnitudes speedup over the direct LU solver, UMFPACK. The resulting multi-GPU-GMRES can also deliver 3-12 Â speedup over the CPU implementation of the same GMRES method on transient analysis.