Groundwater contamination source recognition is an important prerequisite for subsequent remediation efforts. To overcome the limitations of single inversion methods, this study proposed a two-stage inversion framework by integrating two primary inversion approaches—simulation-optimization and simulation-data assimilation—thereby enhancing inversion accuracy. In the first stage, the ensemble smoother with multiple data assimilation method (a type of simulation-data assimilation) conducted a global broad search to provide better initial values and ranges for the second stage. In the subsequent stage, a collective decision optimization algorithm (a type of simulation-optimization) was used for a refined deep search, further enhancing the final inversion accuracy. Additionally, a deep learning method, the multilayer perceptron, was utilized to establish a surrogate of the simulation model, reducing computational costs. These theories and methods were applied and validated in a hypothetical scenario for the synchronous identification of the contamination source and boundary conditions. The results demonstrated that the proposed two-stage inversion framework significantly improved search accuracy compared to single inversion methods, with a mean relative error and mean absolute error of just 4.95% and 0.1756, respectively. Moreover, the multilayer perceptron surrogate model offered greater approximation accuracy to the simulation model than the traditional shallow learning surrogate model. Specifically, the coefficient of determination, mean relative error, mean absolute error, and root mean square error were 0.9860, 9.72%, 0.1727, and 0.47, respectively, highlighting its significant advantages. The findings of this study can provide more reliable technical support for practical case applications and improve subsequent remediation efficiency.