Robert Givan scite author profile

We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully observable case and the partially observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomial-time algorithms. Furthermore, assuming EXP = NEXP, the problems require superexponential time to solve in the worst case.

show abstract

Equivalence notions and model minimization in Markov decision processes

Givan

Dean

Greig

2003

Artificial Intelligence

223

249

View full text Add to dashboard Cite

Bounded-parameter Markov decision processes

Givan

Leach

Dean

2000

Artificial Intelligence

238

173

View full text Add to dashboard Cite

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

Fern¹,

Yoon²,

Givan³

2006

jair

109

View full text Add to dashboard Cite

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goalbased planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.

show abstract

Bounded parameter Markov decision processes

Givan

Leach

Dean

1997

View full text Add to dashboard Cite

In this paper, we introduce the notion of a bounded parameter Markov decision process (BMDP) as a generalization of the familiar exact MDP. A bounded parameter MDP is a set of exact MDPs specified by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). BMDPs form an efficiently solvable special case of the already known class of MDPs with imprecise parameters (MDPIPs). Bounded parameter MDPs can be used to represent variation or uncertainty concerning the parameters of sequential decision problems in cases where no prior probabilities on the parameter values are available. Bounded parameter MDPs can also be used in aggregation schemes to represent the variation in the transition probabilities for different base states aggregated together in the same aggregate state. We introduce interval value functions as a natural extension of traditional value functions. An interval value function assig~as a closed real interval to each state, representing the assertion that the value of that state falls within that interval. An interval value function can be used to bound the performance of a policy over the set of exact MDPs associated with a given bounded parameter MDP. We describe an iterative dynamic programming algorithm called interval policy evaluation which computes an interval value function for a given BMDP and specified policy. Interval policy evaluation on a policy ~r computes the most restrictive interval value function that is sound, i.e., that bounds the value function for 7r in every exact MDP in the set defined by the bounded parameter MDP. We define optimistic and pessimistic notions of optimal policy, and provide a variant of value iteration [Bellman, 1957] that we calf interval value iteration which computes a policies for a BMDP that are optimal in these senses.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Robert Givan

The Complexity of Decentralized Control of Markov Decision Processes

Equivalence notions and model minimization in Markov decision processes

Bounded-parameter Markov decision processes

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

Bounded parameter Markov decision processes

Contact Info

Product

Resources

About