An echocardiogram is a video sequence of a human heart captured using ultrasound imaging. It shows heart structure and motion and helps in diagnosis of cardiovascular diseases. Deep learning methods, which require large amounts of training data have shown success in using echocardiograms to detect cardiovascular disorders such as valvular heart disease. Large datasets of echocardiograms that can be used for machine learning training are scarce. One way to address this problem is to use modern machine learning generative methods to generate synthetic echocardiograms that can be used for machine learning training. In this paper, we propose a video diffusion method for the generation of echocardiograms. Our method uses a 3D selfattention mechanism and a super-resolution model. We demonstrate that our proposed method generates echocardiograms with higher resolution and with lesser artifacts, compared to existing echocardiogram generation methods.