2010
DOI: 10.1007/s10472-010-9216-8
|View full text |Cite
|
Sign up to set email alerts
|

Ranking policies in discrete Markov decision processes

Abstract: An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Initially, we tried to find a set of top-k suboptimal policies that acting agents could take (Dai and Goldsmith 2009) where the next best policy differs from the previous one in only one state. However, as Dai and Goldsmith (2010) mention, there are usually multiple "trivially extended policies" that differ from another only in a non-reachable state. Additionally, the tie-breaking rule uses a lexicographic order, which sometimes excludes policies with the same expected cost.…”
Section: Bounded Suboptimalitymentioning
confidence: 99%
“…Initially, we tried to find a set of top-k suboptimal policies that acting agents could take (Dai and Goldsmith 2009) where the next best policy differs from the previous one in only one state. However, as Dai and Goldsmith (2010) mention, there are usually multiple "trivially extended policies" that differ from another only in a non-reachable state. Additionally, the tie-breaking rule uses a lexicographic order, which sometimes excludes policies with the same expected cost.…”
Section: Bounded Suboptimalitymentioning
confidence: 99%
“…Recall that to solve that problem we need to compute a policy that minimizes the expected number of steps to reach a goal state, and observe that, in this case, the Goal MDP underlying the Goal POMDP in Definition 6 is a layered DAG. Thus, to compute an optimal policy for such a Goal MDP, we can use the topological value iteration algorithm of Dai and Goldsmith (2007).…”
Section: The Dfa Is Loop-omitted Acyclicmentioning
confidence: 99%
“…One can use prioritization to decrease the number of inefficient backups. Faster dynamic programming [8] and ranking policies in discrete Markov Decision Processes [9] are two recent examples.…”
Section: Introductionmentioning
confidence: 99%