The presence of muscles throughout the active parts of the body, such as the upper and lower limbs, makes electromyography-based human-machine interaction prevalent. However, muscle signals are stochastic and noisy, with noises being both regular and irregular. Irregular noises due to movements or electrical switching require dynamic filtering. Conventionally, filters are stacked, which unnecessarily trims and delays the signal. This study introduces a decontamination technique involving a supervised rewarding strategy to drive a deep Q-network-based agent (supDQN). It applies one of three filters to decontaminate a 1 sec long surface electromyography signal, which is dynamically contaminated. A machine learning agent identifies whether the signal after filtering is clean or noisy, generating a reward accordingly. The identification accuracy is enhanced by using a local interpretable model-agnostic explanation. The deep Q-network is guided by this reward to select the filter optimally while decontaminating a signal. The proposed filtering strategy is tested on four noise levels (-5 dB, -1 dB, +1 dB, +5 dB). supDQN filters the signal desirably when the signal-tonoise ratio (SNR) is between -5 dB to +1 dB but filters less desirably at high SNR (+5 dB). A normalized root mean square (Ω) is formulated to depict the difference of the filtered signal from the ground truth. This is used to compare supDQN and conventional methods, including wavelet denoising with debauchies and symlet wavelet, high-order low-pass filter, notch filter, and high-pass filter. The proposed filtering strategy gives an average Ω value of 1.1974, which is lower than that of the conventional filters.