Graphics processing units (GPUs) have been in the spotlight in various fields because they can process a massive amount of computation at a relatively low price. This research proposes a performance acceleration framework applied to Monte Carlo method-based earthquake source parameter estimation using multi-threaded compute unified device architecture (CUDA) GPU. The Monte Carlo method takes an exhaustive computational burden because iterative nonlinear optimization is performed more than 1000 times. To alleviate this problem, we parallelize the rectangular dislocation model, i.e., the Okada model, since the model consists of independent point-wise computations and takes up most of the time in the nonlinear optimization. Adjusting the degree of common subexpression elimination, thread block size, and constant caching, we obtained the best CUDA optimization configuration that achieves 134.94×, 14.00×, and 2.99× speedups over sequential CPU, 16-threads CPU, and baseline CUDA GPU implementation from the 1000×1000 mesh size, respectively. Then, we evaluated the performance and correctness of four different line search algorithms for the limited memory Broyden–Fletcher–Goldfarb–Shanno with boundaries (L-BFGS-B) optimization in the real earthquake dataset. The results demonstrated Armijo line search to be the most efficient one among the algorithms. The visualization results with the best-fit parameters finally derived by the proposed framework confirm that our framework also approximates the earthquake source parameters with an excellent agreement with the geodetic data, i.e., at most 0.5 cm root-mean-square-error (RMSE) of residual displacement.