Traditional methods for obtaining soil heavy metal content are expensive, inefficient, and limited in monitoring range. In order to meet the needs of soil environmental quality evaluation and health status assessment, visible near-infrared spectroscopy and XRF spectroscopy for monitoring heavy metal content in soil have attracted much attention, because of their rapid, nondestructive, economical, and environmentally friendly features. The use of either of these spectra alone cannot meet the accuracy requirements of traditional measurements, while the synergistic use of the two spectra can further improve the accuracy of monitoring heavy metal lead content in soil. Therefore, this study applied various spectral transformations and preprocessing to vis-NIR and XRF spectra; used the whale optimization algorithm (WOA) and competitive adaptive re-weighted sampling (CARS) algorithms to identify feature spectra; designed a combination variable model (CVM) based on multi-layer spectral data fusion, which improved the spectral preprocessing and spectral feature screening process to increase the efficiency of spectral fusion; and established a quantitative model for soil Pb concentration using partial least squares regression (PLSR). The estimation performance of three spectral fusion strategies, CVM, outer-product analysis (OPA), and Granger-Ramanathan averaging (GRA), was discussed. The results showed that the accuracy and efficiency of the CARS algorithm in the fused spectra estimation model were superior to those of the WOA algorithm, with an average coefficient of determination (R2) value of 0.9226 and an average root mean square error (RMSE) of 0.1984. The accuracy of the estimation models established, based on the different spectral types, to predict the Pb content of the soil was ranked as follows: the CVM model > the XRF spectral model > the vis-NIR spectral model. Within the CVM fusion strategy, the estimation model based on CARS and PLSR (CARS_D1+D2) performed the best, with R2 and RMSE values of 0.9546 and 0.2035, respectively. Among the three spectral fusion strategies, CVM had the highest accuracy, OPA had the smallest errors, and GRA showed a more balanced performance. This study provides technical means for on-site rapid estimation of Pb content based on multi-source spectral fusion and lays the foundation for subsequent research on dynamic, real-time, and large-scale quantitative monitoring of soil heavy metal pollution using high-spectral remote sensing images.