Mosaicking of remote sensing images stitches images of different moments or sensors to produce a new image under a uniform geographic coordinate system. In a mosaicking process, the critical enblending operation is divided into color balance, seamline finder, and fusion of overlapping areas, which is still challenging in maintain color consistency and data fidelity. In this paper, a new mosaicking framework using spatiotemporal fusion is proposed to solve the enblending issue. Two additional lowresolution reference images are introduced for each mosaicking image. With spatiotemporal fusion methods, all mosaicking images are reconstructed to a uniform time, then the combination of overlapping areas become easy. Furthermore, a new spatiotemporal fusion method is proposed by cascading enhanced deep neural networks to fuse images quickly and effectively. In the validation procedure, the proposed method is compared with eight color harmony methods or tools by mosaicking the red, green, and blue bands of Landsat-8 images with images from the Moderate-resolution imaging spectroradiometer as the reference. The digital evaluations and visual comparisons demonstrate that the newly method outweighs majority methods regarding to the radiometric, structural, and spectral fidelity, which proves the feasibility of our new enblending method.