Neurosurgical procedures, in which electrodes can be placed in the brain of awake patients, offer remarkable opportunities to discover the neurophysiology underlying human speech. The relative scarcity of these opportunities and the altruism of participating patients obligates us to apply the highest possible rigor to signal interpretation. Intracranial electroencephalography (iEEG) signals recorded during overt speech can present an acoustic-induced vibration artifact that tracks the fundamental frequency (F0) of the participant's voice, encompassing high-gamma frequencies that are used for neural activation during speech production and perception. To advance our understanding of the neural control of speech production and develop reliable speech models, we developed a spatial filtering approach to identify and remove acoustic-induced artifactual components of the recorded signal. We show that traditional reference schemes may jeopardize signal quality, but our data-driven method can denoise the recording while preserving signals from the underlying neural activity.