Monitoring and understanding the development of agricultural management requires fine information on multiple agricultural land use classes. According to the main data bulletin of China’s third national land survey released in 2021, the proportion of arable land in China accounts for 16% of the total land area. After excluding areas that are not suitable for development (slope greater than 25 degrees), the remaining area represents arable land affected by human activities, constituting 96.69% of the total arable land area in the country. Artificially irrigated areas are products of the intertwined interactions between the natural environment and human society. However, the original remote sensing image (China’s land use/cover datasets, CLUDs), which is designed to depict the land use and cover patterns in mainland China, categorizes arable land into two main types: paddy fields and dryland. It lacks a subdivision of artificially irrigated areas. We also found a data discrepancy of more than 10% between the statistical data of rice and the paddy field data in CLUDs. To refine land use data, and then improve the simulation accuracy of the hydrological model, this article proposes a dual-source datasets fusion algorithm, learning based on big data, namely the LUCC statistical data fusion (LUSF) algorithm for integrating the remote sensing-based cropland area dataset and statistics dataset. The runoff simulation results show that, in the Yangtze River Basin, using LUSF datasets, the mean absolute percentage error value of monthly simulated runoff decreased by 0.74%, and the root mean square error value decreased by 0.22 million m3. At the basin scale, the absolute error of the simulated runoff is reduced by an average of 433 million m3 per year, and 36 million m3 every month. The LUSF datasets corrected the canopy interception coefficient effectively; the runoff simulation error was reduced by 2.96 billion m3/a. In the past 40 years, the runoff variation in Dongting Lake and Hanjiang River has been most strongly impacted by changes in the underlying surface. These results reveal that the new data fusion method has some significant improvement over the original method, applicable to the runoff simulation disturbed by strong human activities.