Spatiotemporal fusion (STF) is a cost-effective way to complement the spatiotemporal resolution of multi-source images, which has been employed in various applications requiring image sequences. In real-world applications, the spectral accuracy, spatial accuracy and efficiency of STF play a critical role. Despite this, most STF methods focus on improving the spectral accuracy, while the challenges of spatial information loss and low efficiency have received limited attention. Additionally, the improvements of spectral accuracy, spatial accuracy and efficiency in STF are contradictory, and existing STF methods cannot balance them well, which limits their reliability and applicability for various STF tasks. To solve above issues, this study proposes an objectlevel hybrid spatiotemporal fusion method (OL-HSTFM), which incorporates the efficiency advantage of object-level fusion strategy, spectral accuracy advantage of the three-step method (Fit-FC), and the spatial accuracy advantage of the spatial and temporal adaptive reflectance fusion model (STARFM). The performance of OL-HSTFM was compared with two classic STF methods and eight state-of-the-art STF methods at two sites. The experimental results indicate that OL-HSTFM outperforms the other 10 methods in overall performance and has excellent efficiency. Furthermore, this study proposes a new metric that can assess the accuracy of both spatial and spectral domains in STF, which provides a more comprehensively and intuitively measurement of the quality of fused images compared to commonly used metrics. The program of OL-HSTFM is openly available on https://github.com/Andy-cumt/Object-levelspatiotemporal-fusion-models.