Artifact contamination in EEG (electroencephalogram) signals is a significant problem, especially in naturalistic settings where participants can move freely. This contamination stems from various sources like eye movements, muscle activity, sweat, and electrical interference, whose effects differ greatly from each other. Traditional denoising methods, such as Independent Component Analysis, are limited because they assume a linear relationship between the source of the artifacts and the EEG signals, and often require the dominance of one noise source over others. Moreover, these methods need expert knowledge in EEG analysis and lack an objective standard for evaluation.To overcome these challenges, we propose two innovations: Firstly, we introduce the use of “video-estimated” pose coordinates – the x and y positions of different body points (like wrists, eyes, and ankles) – to assist in the EEG denoising process. Secondly, we present a denoising diffusion model, EEG-DDM, that utilizes both the contaminated EEG signals and these pose coordinates to effectively denoise the EEG. Our findings show that incorporating keypoints (pose coordinates) improves denoising performance and helps maintain cross-spatial dependencies in the data. Additionally, we enhance human interpretability of the process by displaying saliency maps generated by our model, which explain the contributions of these keypoints in the denoising process.