Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/626
|View full text |Cite
|
Sign up to set email alerts
|

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

Abstract: We investigate the task of learning to interpret natural language instructions by jointly reasoning with visual observations and language inputs. Unlike current methods which start with learning from demonstrations (LfD) and then use reinforcement learning (RL) to fine-tune the model parameters, we propose a novel policy optimization algorithm which can dynamically schedule demonstration learning and RL. The proposed training paradigm provides efficient exploration and better generalization beyond existing met… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 6 publications
0
7
0
Order By: Relevance
“…Mapping instruction to action has been studied extensively with intermediate symbolic representations (e.g., Chen and Mooney, 2011; Kim and Mooney, 2012;Artzi and Zettlemoyer, 2013;Artzi et al, 2014;Misra et al, 2015Misra et al, , 2016. Recently, there has been growing interest in direct mapping from raw visual observations to actions (Misra et al, 2017;Xiong et al, 2018;Anderson et al, 2018;Fried et al, 2018). We propose a model that enjoys the benefits of such direct mapping, but explicitly decomposes that task to interpretable goal prediction and action generation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Mapping instruction to action has been studied extensively with intermediate symbolic representations (e.g., Chen and Mooney, 2011; Kim and Mooney, 2012;Artzi and Zettlemoyer, 2013;Artzi et al, 2014;Misra et al, 2015Misra et al, , 2016. Recently, there has been growing interest in direct mapping from raw visual observations to actions (Misra et al, 2017;Xiong et al, 2018;Anderson et al, 2018;Fried et al, 2018). We propose a model that enjoys the benefits of such direct mapping, but explicitly decomposes that task to interpretable goal prediction and action generation.…”
Section: Related Workmentioning
confidence: 99%
“…Executing instructions in interactive environments requires mapping natural language and observations to actions. Recent approaches propose learning to directly map from inputs to actions, for example given language and either structured observations (Mei et al, 2016;Suhr and Artzi, 2018) or raw visual observations (Misra et al, 2017;Xiong et al, 2018). Rather than using a combination of models, these approaches learn a single model to solve language, perception, and planning challenges.…”
Section: Introductionmentioning
confidence: 99%
“…Language grounding refers to interpreting language in a situated context and includes collaborative language grounding toward situated humanrobot dialog (Chai et al, 2016), city exploration (Boye et al, 2014), as well as following high-level navigation instructions . Mapping instructions to low level actions has been explored in structured environments by mapping raw visual representations of the world and text onto actions using using Reinforcement Learning methods (Misra et al, 2017;Xiong et al, 2018;Huang et al, 2019). This work has recently been extended to controlling autonomous systems and robots through human language instruction in a 3D simulated environment (Ma et al, 2019;Misra et al, 2018;Blukis et al, 2019) and Mixed Reality (Huang et al, 2019) and using imitation learning .…”
Section: Related Workmentioning
confidence: 99%
“…Misra et al [21] formulate navigation as a sequential-decision process and propose to use reward shaping to effectively train the RL agent. In the same environment, Xiong et al [37] propose a scheduled training mechanism which yields more efficient exploration and achieves better results. However, these methods still operate in synthetic environments and consider either simple discrete observation inputs or unrealistic top-down view of the environment.…”
Section: Related Workmentioning
confidence: 99%