This study comprehensively evaluates eight satellite-based precipitation datasets in streamflow simulations on a monsoon-climate watershed in China. Two mutually independent datasets—one dense-gauge and one gauge-interpolated dataset—are used as references because commonly used gauge-interpolated datasets may be biased and unable to reflect the real performance of satellite-based precipitation due to sparse networks. The dense-gauge dataset includes a substantial number of gauges, which can better represent the spatial variability of precipitation. Eight satellite-based precipitation datasets include two raw satellite datasets, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) and Climate Prediction Center MORPHing raw satellite dataset (CMORPH RAW); four satellite-gauge datasets, Tropical Rainfall Measuring Mission 3B42 (TRMM), PERSIANN Climate Data Record (PERSIANN CDR), CMORPH bias-corrected (CMORPH CRT), and gauge blended datasets (CMORPH BLD); and two satellite-reanalysis-gauge datasets, Multi-Source Weighted-Ensemble Precipitation (MSWEP) and Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS). The uncertainty related to hydrologic model physics is investigated using two different hydrological models. A set of statistical indices is utilized to comprehensively evaluate the precipitation datasets from different perspectives, including detection, systematic, random errors, and precision for simulating extreme precipitation. Results show that CMORPH BLD and MSWEP generally perform better than other datasets. In terms of hydrological simulations, all satellite-based datasets show significant dampening effects for the random error during the transformation process from precipitation to runoff; however, these effects cannot hold for the systematic error. Even though different hydrological models indeed introduce uncertainties to the simulated hydrological processes, the relative hydrological performance of the satellite-based datasets is consistent in both models. Namely, CMORPH BLD performs the best, which is followed by MSWEP, CMORPH CRT, and TRMM. PERSIANN CDR and CHIRPS perform moderately well, and two raw satellite datasets are not recommended as proxies of gauged observations for their worse performances.