2019
DOI: 10.48550/arxiv.1905.04192
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Do Autonomous Agents Benefit from Hearing?

Abstract: Mapping states to actions in deep reinforcement learning is mainly based on visual information. The commonly used approach for dealing with visual information is to extract pixels from images and use them as state representation for reinforcement learning agent. But, any vision only agent is handicapped by not being able to sense audible cues. Using hearing, animals are able to sense targets that are outside of their visual range. In this work, we propose the use of audio as complementary information to visual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…Learning to Navigate in 3D Environments: Realistic 3D environments and simulation platforms [1,11,84,76] egocentric visual observations [104,103,66,50,12,14] for performing question answering [43,26,78,25], instructions following [4,19,3], and active visual tracking [60,94,95]. Previous studies have shown audio as a strong cue for obstacle avoidance and navigation [77,44,33,70,64,32,83]. However, these methods are either non-photorealistic, not supporting AI agents, or rendering audio not geometrically and acoustically correct.…”
Section: Related Workmentioning
confidence: 99%
“…Learning to Navigate in 3D Environments: Realistic 3D environments and simulation platforms [1,11,84,76] egocentric visual observations [104,103,66,50,12,14] for performing question answering [43,26,78,25], instructions following [4,19,3], and active visual tracking [60,94,95]. Previous studies have shown audio as a strong cue for obstacle avoidance and navigation [77,44,33,70,64,32,83]. However, these methods are either non-photorealistic, not supporting AI agents, or rendering audio not geometrically and acoustically correct.…”
Section: Related Workmentioning
confidence: 99%
“…Park et al [11] introduced a general-purpose simulation platform based on Unity engine with both auditory and visual observations. While ViZDoom [7] supports in-game stereo sounds, the default audio subsystem is not designed for faster-than-realtime experience collection, and thus can only be used in relatively basic scenarios [12]. To our best knowledge, the version of ViZDoom presented in this work is the first simulation platform that enables accelerated embodied simulation with sounds at tens of thousands of actions per second, enabling large-scale training.…”
Section: Related Workmentioning
confidence: 99%
“…To our knowledge there has been limited published work in the domains of audio-based navigation and audio source localization for methods based on reinforcement learning. The work in [2] combined deep reinforcement learning with environmental audio information in the context of navigation of a virtual environment by an autonomous agent. They discovered that an agent trained via reinforcement learning and leveraging raw audio information could reach more reliably a sound-emitting target within a maze compared to the case when only visual information was used.…”
Section: Related Workmentioning
confidence: 99%