Ranking policies in discrete Markov decision processes

Dai, Peng; Goldsmith, Judy

doi:10.1007/s10472-010-9216-8

Cited by 6 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Initially, we tried to find a set of top-k suboptimal policies that acting agents could take (Dai and Goldsmith 2009) where the next best policy differs from the previous one in only one state. However, as Dai and Goldsmith (2010) mention, there are usually multiple "trivially extended policies" that differ from another only in a non-reachable state. Additionally, the tie-breaking rule uses a lexicographic order, which sometimes excludes policies with the same expected cost.…”

Section: Bounded Suboptimalitymentioning

confidence: 99%

Stochastic Goal Recognition Design Problems with Suboptimal Agents

Wayllace¹,

Yeoh²

2022

AAAI

View full text Add to dashboard Cite

Goal Recognition Design (GRD) problems identify the minimum number of environmental modifications aiming to force an interacting agent to reveal its goal as early as possible. Researchers proposed several extensions to the original model, some of them handling stochastic agent action outcomes. While this generalization is useful, it assumes optimal acting agents, which limits its applicability to more realistic scenarios. This paper presents the Suboptimal Stochastic GRD model, where we consider boundedly rational agents that, due to limited resources, might follow a suboptimal policy. Inspired by theories on human behavior asserting that humans are (close to) optimal when making perceptual decisions, we assume the chosen policy has at most m suboptimal actions. Our contribution includes (I) Extending the stochastic goal recognition design framework by supporting suboptimal agents in cases where an observer has either full or partial observability; (ii) Presenting methods to evaluate the ambiguity of the model under these assumptions; and (iii) Evaluating our approach on a range of benchmark applications.

show abstract

Section: Bounded Suboptimalitymentioning

confidence: 99%

Stochastic Goal Recognition Design Problems with Suboptimal Agents

Wayllace¹,

Yeoh²

2022

AAAI

View full text Add to dashboard Cite

show abstract

“…Recall that to solve that problem we need to compute a policy that minimizes the expected number of steps to reach a goal state, and observe that, in this case, the Goal MDP underlying the Goal POMDP in Definition 6 is a layered DAG. Thus, to compute an optimal policy for such a Goal MDP, we can use the topological value iteration algorithm of Dai and Goldsmith (2007).…”

Section: The Dfa Is Loop-omitted Acyclicmentioning

confidence: 99%

Planning to chronicle: Optimal policies for narrative observation of unpredictable events

Rahmani

Shell

O’Kane

2022

The International Journal of Robotics Research

View full text Add to dashboard Cite

One important class of applications entails a robot scrutinizing, monitoring, or recording the evolution of an uncertain time-extended process. This sort of situation leads to an interesting family of active perception problems that can be cast as planning problems in which the robot is limited in what it sees and must, thus, choose what to pay attention to. The distinguishing characteristic of this setting is that the robot has influence over what it captures via its sensors, but exercises no causal authority over the process evolving in the world. As such, the robot’s objective is to observe the underlying process and to produce a “chronicle” of occurrent events, subject to a goal specification of the sorts of event sequences that may be of interest. This paper examines variants of such problems in which the robot aims to collect sets of observations to meet a rich specification of their sequential structure. We study this class of problems by modeling a stochastic process via a variant of a hidden Markov model and specify the event sequences of interest as a regular language, developing a vocabulary of “mutators” that enable sophisticated requirements to be expressed. Under different suppositions on the information gleaned about the event model, we formulate and solve different planning problems. The core underlying idea is the construction of a product between the event model and a specification automaton. Using this product, we compute a policy that minimizes the expected number of steps to reach a goal state. We introduce a general algorithm for this problem as well as several more efficient algorithms for important special cases. The paper reports and compares performance metrics by drawing on some small case studies analyzed in depth via simulation. Specifically, we study the effect of the robot’s observation model on the average time required for the robot to record a desired story. We also compare our algorithm with a baseline greedy algorithm, showing that our algorithm outperforms the greedy algorithm in terms of the average time to record a desired story. In addition, experiments show that the algorithms tailored to specialized variants of the problem are rather more efficient than the general algorithm.

show abstract

“…One can use prioritization to decrease the number of inefficient backups. Faster dynamic programming [8] and ranking policies in discrete Markov Decision Processes [9] are two recent examples.…”

Section: Introductionmentioning

confidence: 99%

Topological Order Value Iteration Algorithm for Solving Probabilistic Planning

Liu¹,

Li²,

Nie³

2013

View full text Add to dashboard Cite

AI researchers typically formulated probabilistic planning under uncertainty problems using Markov Decision Processes (MDPs).Value Iteration is an inefficient algorithm for MDPs, because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-the-art ones. All of these use reachability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an improved algorithm named Topological Order Value Iteration (TOVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. The experimental results demonstrate the effectiveness and excellent performance of our algorithm.

show abstract

Ranking policies in discrete Markov decision processes

Cited by 6 publications

References 11 publications

Stochastic Goal Recognition Design Problems with Suboptimal Agents

Stochastic Goal Recognition Design Problems with Suboptimal Agents

Planning to chronicle: Optimal policies for narrative observation of unpredictable events

Topological Order Value Iteration Algorithm for Solving Probabilistic Planning

Contact Info

Product

Resources

About