Learning in POMDPs with Monte Carlo Tree Search

Katt, Sammie; Oliehoek, Frans A.; Amato, Christopher

doi:10.48550/arxiv.1806.05631

Cited by 1 publication

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One restriction of this framework is that it assumes a fixed reactive probabilistic model of the opponent, implying stationary behavior without rationality. To mitigate performance degradation due to modeling uncertainty, existing approaches include Bayesian-Adaptive POMDP (BA-POMDP) [6], [7], robust POMDP [8], Chance-constrained POMDP (CC-POMDP) [9], and Interactive-POMDP (I-POMDP) [10].…”

Section: A Pomdp Frameworkmentioning

confidence: 99%

“…BA-POMDP augments the state space with a state transition count and a state observation count variables as additional hidden states [6], [7]. It maintains a belief over the augmented state space, resulting in an optimal trade-off between model learning and reward collecting.…”

Section: A Pomdp Frameworkmentioning

confidence: 99%

“…The soft-Q learning objective is to maximize the expected reward regularized by the entropy of the policy, t E bt,st,at∼ρπ γ t [r(b t , s t , a t ) + σH(π(•|b t , s t ))]. (7) The parameter σ controls the 'softness' of the policy. The nice interpretation of this objective function is maximizing accumulative reward while behaving as uncertain as possible, which is a desired property against an adversary.…”

Section: Policy Learningmentioning

confidence: 99%

See 2 more Smart Citations

Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

Shen

How

2019

2019 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach. 1 M. Shen is with the Department of Mechanical Engineering, Massachusetts Institute of Technology (MIT), 77 Massachusetts Ave., Cambridge, MA, USA. macshen@mit.edu 2 J. How is with the

show abstract

Section: A Pomdp Frameworkmentioning

confidence: 99%

Section: A Pomdp Frameworkmentioning

confidence: 99%

Section: Policy Learningmentioning

confidence: 99%

See 1 more Smart Citation