High spatio–temporal resolution remote sensing images are of great significance in the dynamic monitoring of the Earth’s surface. However, due to cloud contamination and the hardware limitations of sensors, it is difficult to obtain image sequences with both high spatial and temporal resolution. Combining coarse resolution images, such as the moderate resolution imaging spectroradiometer (MODIS), with fine spatial resolution images, such as Landsat or Sentinel-2, has become a popular means to solve this problem. In this paper, we propose a simple and efficient enhanced linear regression spatio–temporal fusion method (ELRFM), which uses fine spatial resolution images acquired at two reference dates to establish a linear regression model for each pixel and each band between the image reflectance and the acquisition date. The obtained regression coefficients are used to help allocate the residual error between the real coarse resolution image and the simulated coarse resolution image upscaled by the high spatial resolution result of the linear prediction. The developed method consists of four steps: (1) linear regression (LR), (2) residual calculation, (3) distribution of the residual and (4) singular value correction. The proposed method was tested in different areas and using different sensors. The results show that, compared to the spatial and temporal adaptive reflectance fusion model (STARFM) and the flexible spatio–temporal data fusion (FSDAF) method, the ELRFM performs better in capturing small feature changes at the fine image scale and has high prediction accuracy. For example, in the red band, the proposed method has the lowest root mean square error (RMSE) (ELRFM: 0.0123 vs. STARFM: 0.0217 vs. FSDAF: 0.0224 vs. LR: 0.0221). Furthermore, the lightweight algorithm design and calculations based on the Google Earth Engine make the proposed method computationally less expensive than the STARFM and FSDAF.