A guide to vision-based map building

Wooden, David

doi:10.1109/mra.2006.1638021

Cited by 26 publications

(12 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of the main advantages of our method over these approaches is that it does not need a prior map of the environment. Another class of navigation methods reconstruct a map on the fly and use it for navigation [8], [9], [10], [11], or go through a training phase guided by humans to build the map [12], [13]. In contrast, our method does not require a map of the environment, as it does not have any assumption on the landmarks of the environment, nor does it require a human-guided training phase.…”

Section: Related Workmentioning

confidence: 99%

Target-driven visual navigation in indoor scenes using deep reinforcement learning

Zhu

Mottaghi

Kolve

et al. 2017

2017 IEEE International Conference on Robotics and Automation (ICRA)

1,317

1,060

View full text Add to dashboard Cite

Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new target goals, and (2) data inefficiency i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to the task of target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows to better generalize. To address the second issue, we propose AI2-THOR framework, which provides an environment with highquality 3D scenes and physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently.We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and across scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment.The supplementary video can be accessed at the following link: https://youtu.be/SmBxMDiOrvs. arXiv:1609.05143v1 [cs.CV]

show abstract

Section: Related Workmentioning

confidence: 99%

Target-driven visual navigation in indoor scenes using deep reinforcement learning

Zhu

Mottaghi

Kolve

et al. 2017

2017 IEEE International Conference on Robotics and Automation (ICRA)

1,317

1,060

View full text Add to dashboard Cite

show abstract

“…Prominent early map-based navigation methods [47,6,7,64] use a global map to make decisions. More recent approaches [76,87,23,85,46,71] reconstruct the map on the fly. Simultaneous localization and mapping [84,74,24,12,67,77] consider mapping in isolation.…”

Section: Related Workmentioning

confidence: 99%

Two Body Problem: Collaborative Visual Task Completion

Jain

Weihs

Kolve

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Figure 1: Two agents learn to successfully navigate through a previously unseen environment to find, and jointly lift, a heavy TV. Without learned communication, agents attempt many failed actions and pickups. With learned communication, agents send a message when they observe or when they intend to interact with the TV. The agents also learn to grab the opposite ends of the TV and coordinate to do so. AbstractCollaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is communication that can be either explicit, through messages, or implicit, through perception of the other agents and the visual world. Learning to collaborate in a visual environment entails learning (1) to perform the task, (2) when and what to communicate, and (3) how to act based on these communications and the perception of the visual world. In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. Refer to our project page for more details: https://prior.allenai.org/projects/ two-body-problem * indicates equal contributions.

show abstract

“…Offline mapbased techniques [6,7,31,52] require the complete map of the environment to make any decisions about their actions, which limits their use in unseen environments. Online map-based methods [10,13,50,61,67,69] often construct the map while exploring the environment. The majority of these approaches use the computed map for navigation only, whereas our model constructs a rich semantic map which is used for navigation as well as planning and question answering.…”

Section: Related Workmentioning

confidence: 99%

IQA: Visual Question Answering in Interactive Environments

Gordon

Kembhavi

Rastegari

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

316

295

View full text Add to dashboard Cite

We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: "Are there any apples in the fridge?" The agent must navigate around the scene, acquire visual understanding of scene elements, interact with objects (e.g. open refrigerators) and plan for a series of actions conditioned on the question. Popular reinforcement learning approaches with a single controller perform poorly on IQA owing to the large and diverse state space. We propose the Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction. To evaluate HIMN, we introduce IQUAD V1, a new dataset built upon AI2-THOR [35], a simulated photo-realistic environment of configurable indoor scenes with interactive objects. 1 IQUAD V1 has 75,000 questions, each paired with a unique scene configuration. Our experiments show that our proposed model outperforms popular single controller based methods on IQUAD V1. For sample questions and results, please view our video: https://youtu.be/pXd3C-1jr98.

show abstract

A guide to vision-based map building

Cited by 26 publications

References 9 publications

Target-driven visual navigation in indoor scenes using deep reinforcement learning

Target-driven visual navigation in indoor scenes using deep reinforcement learning

Two Body Problem: Collaborative Visual Task Completion

IQA: Visual Question Answering in Interactive Environments

Contact Info

Product

Resources

About