PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Yang, Fangkai; Lyu, Daoming; Liu, Bo; Gustafson, Steven

doi:10.24963/ijcai.2018/675

Cited by 81 publications

(64 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Other Reinforcement Learning based methods In [32], the authors also combine pipeline search and hyper-parameter optimization in a reinforcement learning process based on the PEORL [33] framework, however, the hyperparameter is randomly sampled during the reinforcement learning process, an extra stage is needed to sweep the hyper-parameters using hyper-parameter optimization techniques, while in our work, hyper-parameter optimization is embedded in the reinforcement learning process. Alpha3M [14] combined MCTS and recurrent neural network in a self play [27] fashion, however, it seems that Alpha3M does not perform better than the state of art AutoML systems.…”

Section: Reinforcement Learning Based Neural Network Architecture Searchmentioning

confidence: 99%

ReinBo: Machine Learning Pipeline Conditional Hierarchy Search and Configuration with Bayesian Optimization Embedded Reinforcement Learning

Sun

Lin

Bischl

2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search.

show abstract

Section: Reinforcement Learning Based Neural Network Architecture Searchmentioning

confidence: 99%

ReinBo: Machine Learning Pipeline Conditional Hierarchy Search and Configuration with Bayesian Optimization Embedded Reinforcement Learning

Sun

Lin

Bischl

2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…Integrating robot task planning and learning of navigation costs has also been investigated [15]. Recent approaches such as PEORL [35] and SDRL [25] utilize closed-loop communication between planning and learning: an optimal symbolic plan is obtained from an iterative process of planning and learning, so that planning and learning can mutually benefit each other. However, most of these approaches have only been applied to artificial domains.…”

Section: Related Workmentioning

confidence: 99%

Task-Motion Planning with Reinforcement Learning for Adaptable Mobile Service Robots

Jiang

Yang

Zhang

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Self Cite

View full text Add to dashboard Cite

Task-motion planning (TMP) addresses the problem of efficiently generating executable and low-cost task plans in a discrete space such that the (initially unknown) action costs are determined by motion plans in a corresponding continuous space. However, a taskmotion plan can be sensitive to unexpected domain uncertainty and changes, leading to suboptimal behaviors or execution failures. In this paper, we propose a novel framework, TMP-RL, which is an integration of TMP and reinforcement learning (RL) from the execution experience, to solve the problem of robust task-motion planning in dynamic and uncertain domains. TMP-RL features two nested planning-learning loops. In the inner TMP loop, the robot generates a low-cost, feasible task-motion plan by iteratively planning in the discrete space and updating relevant action costs evaluated by the motion planner in continuous space. In the outer loop, the plan is executed, and the robot learns from the execution experience via model-free RL, to further improve its task-motion plans. RL in the outer loop is more accurate to the current domain but also more expensive, and using less costly task and motion planning leads to a jump-start for learning in the real world. Our approach is evaluated on a mobile service robot conducting navigation tasks in an office area. Results show that TMP-RL approach significantly improves adaptability and robustness (in comparison to TMP methods) and leads to rapid convergence (in comparison to task planning (TP)-RL methods). We also show that TMP-RL can reuse learned values to smoothly adapt to new scenarios during long-term deployments.

show abstract

“…These approaches were based on integrating symbolic planning with value iteration methods of reinforcement learning, and in their work, there was no bidirectional communication loop between planning and learning so that they could not mutually benefit each other. The latest work in this direction is PEORL framework [42] and SDRL [21], where ASP-based planning was integrated with R-learning [35] into planning-learning loop. PACMAN architecture is a new framework of integrating symbolic planning with RL, in particular, integrating planning with AC algorithm for the first time, and also features bidirectional communication between planning and learning.…”

Section: Related Workmentioning

confidence: 99%

“…From the first perspective, research from KR community on modular action languages [20,5,10] proposed formal languages to encode a general-purpose library of actions that can be used to define a wide range of benchmark planning problems as special cases, leading to a representation that is elaboration tolerant and addressing the problem of generality of AI [24]. Meanwhile, researchers from the RL community focused on incorporating high-level abstraction into flat RL, leading to options framework for hierarchical RL [2], hierarchical abstract machines [27], and more recently, works that integrate symbolic knowledge represented in answer set programming (ASP) into reinforcement learning framework [19,42,21,11]. From the second perspective, imitation learning, including learning from demonstration (LfD) [1] and inverse reinforcement learning (IRL) [26] tried to learn policies from examples of a human expert, or learn directly from human feedback [39,15,6], a.k.a, human-centered reinforcement learning (HCRL).…”

Section: Introductionmentioning

confidence: 99%

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Lyu

Yang

Liu

et al. 2019

Electron. Proc. Theor. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry. By contrast, human learning is significantly faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a Planner-Actor-Critic architecture for huMAN-centered planning and learning (PACMAN), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, and also integrates the Actor-Critic algorithm of RL to fine-tune its behavior towards both environmental rewards and human feedback. This work is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump-start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent, and misleading feedback.

show abstract

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Cited by 81 publications

References 0 publications

ReinBo: Machine Learning Pipeline Conditional Hierarchy Search and Configuration with Bayesian Optimization Embedded Reinforcement Learning

ReinBo: Machine Learning Pipeline Conditional Hierarchy Search and Configuration with Bayesian Optimization Embedded Reinforcement Learning

Task-Motion Planning with Reinforcement Learning for Adaptable Mobile Service Robots

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Contact Info

Product

Resources

About