The cocktail party problem refers to a challenging process when the human sensory system tries to separate a specific voice from a loud mixture of background sound sources. The problem is much more demanding for machines and has become the holy grail in robotic hearing. Despite many advances in noise suppression, the intrinsic information from the contaminated acoustic channel remains difficult to recover. Herein, a simple‐yet‐powerful laser‐assisted audio system termed robot ear accomplished by laser (REAL) is shown to probe the vibrations of sound‐carrying surfaces (mask, throat, and other nearby surfaces) in optical channel, which is intrinsically immune to acoustic background noises. The results demonstrate that REAL can directly obtain the audio‐frequency content from the laser without acoustic channel interference. The signals can be further transcribed into human‐recognizable audios by exploiting the internal time and frequency correlations through memory‐enabled neural networks. The REAL system would enable a new way in human–robot interaction. An interactive preprint version of the article can be found at: https://www.authorea.com/doi/full/10.1002/aisy.202200143.
This Supporting Information includes: a comparison of the REAL (Robot Ear Accomplished by Laser) with a typical vibration measuring system (Laser Doppler Vibrometers, LDV), frequency response of various materials on REAL and real-time analysis of REAL audio neural network model. Xiaoping Hong Email: hongxp@sustech.edu.cn
The cocktail party problem refers to a challenging process when the human sensory system tries to separate a specific voice from a loud mixture of background sound sources. The problem is much more demanding for machines and has become the holy grail in robotic hearing. Despite the many advances in noise suppression, the intrinsic information from the contaminated acoustic channel remains difficult to recover. Here we show a simple-yet-powerful laser-assisted audio system termed REAL (Robot Ear Accomplished by Laser) to probe the vibrations of sound-carrying surfaces (mask, throat and other nearby surfaces) in optical channel, which is intrinsically immune to acoustic background noises. Our results demonstrate that REAL can directly obtain the audio-frequency content from the laser without acoustic channel interference. The signals can be further transcribed into human-recognizable audio by exploiting the internal time and frequency correlations through memory-enabled neural networks. The REAL system would enable a new way in human-robot interaction. Xiaoping Hong Email: hongxp@sustech.edu.cn
The increasing popularity of small drones has stressed the urgent need for an effective drone‐oriented surveillance system that can work day and night. Herein, an acoustic and optical sensor‐fusion‐based system‐termed multimodal unmanned aerial vehicle 3D trajectory exposure system (MUTES) is presented to detect and track drone targets. MUTES combines multiple sensor modules including microphone array, camera, and lidar. The 64‐channel microphone array provides semispherical surveillance with high signal‐to‐noise ratio of sound source estimation, while the long‐range lidar and the telephoto camera are capable of subsequent precise target localization in a narrower but higher definition field of view. MUTES employs a coarse‐to‐fine, passive‐to‐active localization strategy for wide‐range detection (semispherical) and high‐precision 3D tracking. To further increase the fidelity, an environmental denoising model is trained, which helps to select valid acoustic features from a drone target, thus overcomes the drawbacks of the traditional sound source localization approaches when facing noise interference. The effectiveness of the proposed sensor‐fusion approach is validated through field experiments. To the best of the knowledge, MUTES provides the farthest detection range, highest 3D position accuracy, strong anti‐interference capability, and acceptable cost for unverified drone intruders.
The cocktail party problem refers to a challenging process when the human sensory system tries to separate a specific voice from a loud mixture of background sound sources. The problem is much more demanding for machines and has become the holy grail in robotic hearing. Despite the many advances in noise suppression, the intrinsic information from the contaminated acoustic channel remains difficult to recover. Here we show a simple-yet-powerful laser-assisted audio system termed REAL (Robot Ear Accomplished by Laser) to probe the vibrations of sound-carrying surfaces (mask, throat and other nearby surfaces) in optical channel, which is intrinsically immune to acoustic background noises. Our results demonstrate that REAL can directly obtain the audio-frequency content from the laser without acoustic channel interference. The signals can be further transcribed into human-recognizable audio by exploiting the internal time and frequency correlations through memory-enabled neural networks. The REAL system would enable a new way in human-robot interaction. Xiaoping Hong Email: hongxp@sustech.edu.cn
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.