2020 IEEE International Conference on Robotics and Automation (ICRA) 2020
DOI: 10.1109/icra40945.2020.9196934
|View full text |Cite
|
Sign up to set email alerts
|

BatVision: Learning to See 3D Spatial Layout with Two Ears

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
23
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(24 citation statements)
references
References 16 publications
1
23
0
Order By: Relevance
“…The work presented in [23] also supports our claim that the use of ultrasonic sensors can be extremely useful in situations where other sensors would not perform. They train a CNN to transform a binaural sound signal into a visual representation of the scene.…”
Section: A Batvisionsupporting
confidence: 80%
See 1 more Smart Citation
“…The work presented in [23] also supports our claim that the use of ultrasonic sensors can be extremely useful in situations where other sensors would not perform. They train a CNN to transform a binaural sound signal into a visual representation of the scene.…”
Section: A Batvisionsupporting
confidence: 80%
“…Comparing the performance of our model to state-of-the-art models aimed at solving a similar problem is difficult, as to our knowledge no similar research results can be found transforming sonar measurements into LiDAR point cloud predictions. The work presented in [23] and [25] has a similar goal, but a direct comparison cannot be made as the used modalities are different.…”
Section: Resultsmentioning
confidence: 99%
“…Recent work also uses self-produced echolocation sounds produced by onboard speakers. Christensen et al [20] predict depth maps from real-world scenes using echo responses. Gao et al [21] learns visual representations by echolocation in a simulated environment [22].…”
Section: Related Workmentioning
confidence: 99%
“…Recently, deep neural networks have been applied to this task as well. For instance, Christensen et al [56] used an encoder-decoder type of network to predict scene layouts with echos from two artificial human ears. This setting is further extended by integrating visual cues from a paired monocular camera [57].…”
Section: Related Workmentioning
confidence: 99%