Remote sensing image with high spatial and temporal resolution is very important for rational planning and scientific management of land resources. However, due to the influence of satellite resolution, revisit period, and cloud pollution, it is difficult to obtain high spatial and temporal resolution images. In order to effectively solve the “space–time contradiction” problem in remote sensing application, based on GF-2PMS (GF-2) and PlanetSope (PS) data, this paper compares and analyzes the applicability of FSDAF (flexible spatiotemporal data fusion), STDFA (the spatial temporal data fusion approach), and Fit_FC (regression model fitting, spatial filtering, and residual compensation) in different terrain conditions in karst area. The results show the following. (1) For the boundary area of water and land, the FSDAF model has the best fusion effect in land boundary recognition, and provides rich ground object information. The Fit_FC model is less effective, and the image is blurry. (2) For areas such as mountains, with large changes in vegetation coverage, the spatial resolution of the images fused by the three models is significantly improved. Among them, the STDFA model has the clearest and richest spatial structure information. The fused image of the Fit_FC model has the highest similarity with the verification image, which can better restore the coverage changes of crops and other vegetation, but the actual spatial resolution of the fused image is relatively poor, the image quality is fuzzy, and the land boundary area cannot be clearly identified. (3) For areas with dense buildings, such as cities, the fusion image of the FSDAF and STDFA models is clearer and the Fit_FC model can better reflect the changes in land use. In summary, compared with the Fit_FC model, the FSDAF model and the STDFA model have higher image prediction accuracy, especially in the recognition of building contours and other surface features, but they are not suitable for the dynamic monitoring of vegetation such as crops. At the same time, the image resolution of the Fit_FC model after fusion is slightly lower than that of the other two models. In particular, in the water–land boundary area, the fusion accuracy is poor, but the model of Fit_FC has unique advantages in vegetation dynamic monitoring. In this paper, three spatiotemporal fusion models are used to fuse GF-2 and PS images, which improves the recognition accuracy of surface objects and provides a new idea for fine classification of land use in karst areas.