“…Currently, the principle of most long-distance voice acquisition technologies based on laser interference is to irradiate a static target placed in a long-distance sound field with an infrared laser, followed by coherent demodulation to obtain the target's vibration information along the light's direction, and then use the voice enhancement method to denoise the vibration information in order to restore the voice information in the sound field [3][4] .There are currently two primary categories of speech enhancement methods for long-distance laser speech: one is the conventional single-channel speech enhancement method based on Wiener filtering, wavelet threshold, etc. ; such methods presuppose that the signal and noise meet the correlation conditions [5][6] ; the other is a data-based deep learning method, which necessitates the creation of an appropriate speech data set for learning complex mapping relationships [7][8][9] .The methods mentioned above can extract reasonably clear speech information from the distant sound field for static targets in the lab. Finding an appropriate static target in the far sound field is not always attainable, though, when used for real-world applications.…”