Point-based online value iteration algorithm in large POMDP

Wu, Bo; Zheng, Hui; Feng, Yaokai

doi:10.1007/s10489-013-0479-8

Cited by 7 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The belief states space is the posterior distribution of the tracking system states conditioned on the observable history at each time [9]. However, planning of POMDPs over the belief states space is the curse of dimensionality which makes it impossible to obtain an online solution of target tracking scheduling [13].…”

Section: Posterior Belief Clustering (Pbc) Algorithmmentioning

confidence: 99%

Posterior Belief Clustering Algorithm For Energy-Efficient Tracking In Wireless Sensor Networksd

Feng

Zheng

2014

International Journal on Smart Sensing and Intelligent Systems

Self Cite

View full text Add to dashboard Cite

In this paper, we propose a novel posterior belief clustering (PBC) algorithm to solve the tradeoff between target tracking performance and sensors energy consumption in wireless sensor networks. We model the target tracking under dynamic uncertain environment using partially observable Markov decision processes (POMDPs), and transform the optimization of the tradeoff between tracking performance and energy consumption into yielding the optimal value function of POMDPs. We analyze the error of a class of continuous posterior beliefs by Kullback-Leibler (KL) divergence, and cluster these posterior beliefs into one based on the error of KL divergence. So, we calculate the posterior reward value only once for each cluster to eliminate repeated computation. The numerical results show that the proposed algorithm has its effectiveness in optimizing the tradeoff between tracking performance and energy consumption.

show abstract

Section: Posterior Belief Clustering (Pbc) Algorithmmentioning

confidence: 99%

Posterior Belief Clustering Algorithm For Energy-Efficient Tracking In Wireless Sensor Networksd

Feng

Zheng

2014

International Journal on Smart Sensing and Intelligent Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Formally, BRL can be reformulated as a partially observable Markov decision process (POMDP) [12,33,61]. This formulation is used in many proposals [9, 19, 33-35, 51, 58, 59].…”

Section: Introductionmentioning

confidence: 99%

Approximate planning for bayesian hierarchical reinforcement learning

et al. 2014

View full text Add to dashboard Cite

In this paper, we propose to use hierarchical action decomposition to make Bayesian model-based reinforcement learning more efficient and feasible for larger problems. We formulate Bayesian hierarchical reinforcement learning as a partially observable semi-Markov decision process (POSMDP). The main POSMDP task is partitioned into a hierarchy of POSMDP subtasks. Each subtask might consist of only primitive actions or hierarchically call other subtasks' policies, since the policies of lower-level subtasks are considered as macro actions in higher-level subtasks. A solution for this hierarchical action decomposition is to solve lower-level subtasks first, then higher-level ones. Because each formulated POSMDP has a continuous state space, we sample from a prior belief to build an approximate model for them, then solve by using a recently introduced Monte Carlo Value Iteration with Macro-Actions solver. We name this method Monte Carlo Bayesian Hierarchical Reinforcement Learning. Simulation results show that our algorithm exploiting the action hierarchy performs N. A. Vien ( ) Machine Learning and Robotics Lab, significantly better than that of flat Bayesian reinforcement learning in terms of both reward, and especially solving time, in at least one order of magnitude.Keywords Reinforcement learning · Bayesian model-based RL · Bayesian reinforcement learning · Model-based reinforcement learning · Partially observable Markov decision process (POMDP) · Partially observable semi-MDP (POSDMP)

show abstract