Neural-Linear Architectures for Sequential Decision Making

Zahavy, Tom; Haroush, Matan; Merlis, Nadav; Mankowitz, Daniel J.; Mannor, Shie

doi:10.1109/indiancc.2019.8715603

Cited by 9 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In certain domains, the number of actions available to an agent is large, which can greatly affect scalability. Previous work in RL has considered decomposing actions into independent sub-actions [49], generalizing across similar actions by embedding them in a continuous space [50], or learning which actions to eliminate via supervision provided by the environment [51]. Existing approaches in planning consider progressively widening the search based on a heuristic [52] or learning a partial policy for eliminating actions in the search tree [53].…”

Section: Proposed Methodsmentioning

confidence: 99%

Planning spatial networks with Monte Carlo tree search

2023

View full text Add to dashboard Cite

We tackle the problem of goal-directed graph construction: given a starting graph, finding a set of edges whose addition maximally improves a global objective function. This problem emerges in many transportation and infrastructure networks that are of critical importance to society. We identify two significant shortcomings of present reinforcement learning methods: their exclusive focus on topology to the detriment of spatial characteristics (which are known to influence the growth and density of links), as well as the rapid growth in the action spaces and costs of model training. Our formulation as a deterministic Markov decision process allows us to adopt the Monte Carlo tree search framework, an artificial intelligence decision-time planning method. We propose improvements over the standard upper confidence bounds for trees (UCT) algorithm for this family of problems that addresses their single-agent nature, the trade-off between the cost of edges and their contribution to the objective, and an action space linear in the number of nodes. Our approach yields substantial improvements over UCT for increasing the efficiency and attack resilience of synthetic networks and real-world Internet backbone and metro systems, while using a wall clock time budget similar to other search-based algorithms. We also demonstrate that our approach scales to significantly larger networks than previous reinforcement learning methods, since it does not require training a model.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Planning spatial networks with Monte Carlo tree search

2023

View full text Add to dashboard Cite

show abstract

“…However, these approaches require the assumption that actions have dense semantic information, consist of natural language, and cannot be applied to general high-dimensional tasks. Some works propose solutions for generic large-scale action spaces, such as dividing the action space by using multiple hierarchical policies similar to a tree structure to reduce the action dimension of each layer of the policy [10,14,96]; or gradually increasing the action space employing curriculum learning so that the policy only needs to be optimized in a smaller action space in the early stage [20].…”

Section: A2 Structured or Large-scale Actionsmentioning

confidence: 99%

Diverse Policy Optimization for Structured Action Space

Li¹,

Wang²,

Yang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Enhancing the diversity of policies is beneficial for robustness, exploration, and transfer in reinforcement learning (RL). In this paper, we aim to seek diverse policies in an under-explored setting, namely RL tasks with structured action spaces with the two properties of composability and local dependencies. The complex action structure, non-uniform reward landscape, and subtle hyperparameter tuning due to the properties of structured actions prevent existing approaches from scaling well. We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework. A recently proposed novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. DPO follows a joint optimization framework: the outer layer uses the diverse policies sampled by the GFlowNet to update the EBM-based policies, which supports the GFlowNet training in the inner layer. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies in challenging scenarios and substantially outperform existing state-of-the-art methods.

show abstract

“…In addition, the template-based action space is introduced where the agent selects first a template, and then a verb-object pair either individually (Hausknecht et al, 2020) or conditioned on the selected template (Ammanabrolu and Hausknecht, 2020). Even using the reduced action space, approaches filtering unnecessary actions can further improve the computational tractability and speed up the learning convergence (Zahavy et al, 2018;Jain et al, 2020).…”

Section: Combinatorial Action Space In Tbgsmentioning

confidence: 99%

“…For example, some works consider a set of currently admissible actions (He et al, 2016), or a template-based action space (Hausknecht et al, 2020). Alternatively, some other works alleviated this challenge by filtering inadmissible actions through methods such as action affordance (Jain et al, 2020), bandit-based elimination (Zahavy et al, 2018) and rule-based scoring (Ammanabrolu and Riedl, 2019).…”

Section: Introductionmentioning

confidence: 99%

Self-imitation Learning for Action Generation in Text-based Games

Shi,

Xu,

Fang

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

In this work, we study reinforcement learning (RL) in solving text-based games. We address the challenge of combinatorial action space, by proposing a confidence-based self-imitation model to generate action candidates for the RL agent. Firstly, we leverage the self-imitation learning to rank and exploit past valuable trajectories to adapt a pre-trained language model (LM) towards a target game. Then, we devise a confidence-based strategy to measure the LM's confidence with respect to a state, thus adaptively pruning the generated actions to yield a more compact set of action candidates. In multiple challenging games, our model demonstrates promising performance in comparison to the baselines.

show abstract

Neural-Linear Architectures for Sequential Decision Making

Cited by 9 publications

References 0 publications

Planning spatial networks with Monte Carlo tree search

Planning spatial networks with Monte Carlo tree search

Diverse Policy Optimization for Structured Action Space

Self-imitation Learning for Action Generation in Text-based Games

Contact Info

Product

Resources

About