Deep learning techniques for radar echo extrapolation and prediction have become crucial for short-term precipitation forecasts in recent years. As the extrapolation leading time extends, radar echo intensity attenuates increasingly, and the forecast performance on strong echoes declines rapidly. These are two typical characteristics contributing to the current inaccurate results of radar extrapolation. To this end, we propose a novel diffusion radar echo extrapolation (DiffREE) algorithm driven by echo frames in this study. This algorithm deeply integrates the spatio-temporal information of radar echo frames through a conditional encoding module, and then it utilizes a Transformer encoder to automatically extract the spatio-temporal features of echoes. These features serve as inputs to the conditional diffusion model, driving the model to reconstruct the current radar echo frame. Moreover, a validation experiment demonstrates that the proposed method can generate high-precision and high-quality forecast images of radar echoes. To further substantiate the model performance, the DiffREE algorithm is compared with the other four models by using public datasets. In the radar echo extrapolation task, the DiffREE demonstrates a remarkable improvement in the evaluation metrics of critical success index, equitable threat score, Heidke skill score and probability of detection by 21.5%, 27.6%, 25.8%, and 21.8%, respectively, displaying notable superiority.