BatVision: Learning to See 3D Spatial Layout with Two Ears

Christensen, Jakob Munkholt; Hornauer, Sascha; Yu, Stella X.

doi:10.1109/icra40945.2020.9196934

Cited by 43 publications

(24 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work presented in [23] also supports our claim that the use of ultrasonic sensors can be extremely useful in situations where other sensors would not perform. They train a CNN to transform a binaural sound signal into a visual representation of the scene.…”

Section: A Batvisionsupporting

confidence: 80%

“…Comparing the performance of our model to state-of-the-art models aimed at solving a similar problem is difficult, as to our knowledge no similar research results can be found transforming sonar measurements into LiDAR point cloud predictions. The work presented in [23] and [25] has a similar goal, but a direct comparison cannot be made as the used modalities are different.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Predicting LiDAR Data From Sonar Images

2021

View full text Add to dashboard Cite

Sensors using ultrasonic sound have proven to provide accurate 3D perception in difficult environments where other modalities fail. Several industrial sectors need accurate and reliable sensing in these harsh conditions. The conventional LiDAR/camera approach in many state-of-the-art autonomous navigation methods is limited to environments with optimal sensing conditions for visual modalities. The use of other sensing modalities can thus improve reliability and usability and increase the application potential of autonomous agents. Ultrasonic measurements provide, compared to LiDAR, a much sparser representation of the environment, making a direct replacement of the LiDAR sensor difficult. In this work, we propose a method to predict LiDAR point cloud data from an in-air acoustic sonar sensor using a convolutional stacked autoencoder. This provides a robotic system with high-resolution measurements and allows for easier integration into existing systems to safely navigate environments where visual modalities become unreliable and less accurate. A video of our predictions is available at https://youtu.be/jlx1S-tslmo.

show abstract

Section: A Batvisionsupporting

confidence: 80%

Section: Resultsmentioning

confidence: 99%

Predicting LiDAR Data From Sonar Images

2021

View full text Add to dashboard Cite

show abstract

“…Recent work also uses self-produced echolocation sounds produced by onboard speakers. Christensen et al [20] predict depth maps from real-world scenes using echo responses. Gao et al [21] learns visual representations by echolocation in a simulated environment [22].…”

Section: Related Workmentioning

confidence: 99%

Structure from Silence: Learning Scene Structure from Ambient Sound

Chen¹,

Hu²,

Owens³

2021

Preprint

View full text Add to dashboard Cite

https://ificl.github.io/structure-from-silence (a) Quiet Campus dataset (b) Depth estimation (c) Multimodal self-supervision ( ) , ( ) , (a) Quiet Campus dataset (b) Depth estimation (c) Multimodal self-supervision Figure 1: What can ambient sound tell us about 3D scene structure? (a) We collect an "in-the-wild" dataset of paired audio and RGB-D recordings from quiet indoor scenes. (b) Given audio from a scene, we estimate distance to a wall. (c) We use this ambient sound to learn audio-visual representations through self-supervision.

show abstract

“…Recently, deep neural networks have been applied to this task as well. For instance, Christensen et al [56] used an encoder-decoder type of network to predict scene layouts with echos from two artificial human ears. This setting is further extended by integrating visual cues from a paired monocular camera [57].…”

Section: Related Workmentioning

confidence: 99%

Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

Dai¹,

Vasudevan²,

Matas³

et al. 2021

Preprint

View full text Add to dashboard Cite

Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 • camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision 'teacher' methods and a sound 'student' method -the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial -training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released on the project page: https://www.trace.ethz.ch/publications/ 2020/sound perception/index.html.

show abstract

BatVision: Learning to See 3D Spatial Layout with Two Ears

Cited by 43 publications

References 16 publications

Predicting LiDAR Data From Sonar Images

Predicting LiDAR Data From Sonar Images

Structure from Silence: Learning Scene Structure from Ambient Sound

Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

Contact Info

Product

Resources

About