Daniel Pirutinsky scite author profile

We present efficient algorithms for computing optimal or approximately optimal strategies in a zero-sum game for which Player I has n pure strategies and Player II has an arbitrary number of pure strategies. We assume that for any given mixed strategy of Player I, a best response or "approximate" best response of Player II can be found by an oracle in time polynomial in n. We then show how our algorithms may be applied to several search games with applications to security and counter-terrorism. We evaluate our main algorithm experimentally on a prototypical search game. Our results show it performs well compared to an existing, well-known algorithm for solving zero-sum games that can also be used to solve search games, given a best response oracle.

show abstract

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

Cowan¹,

Katehakis²,

Pirutinsky³

2019

Preprint

View full text Add to dashboard Cite

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of Burnetas and Katehakis [9] with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

show abstract

Reinforcement learning: a comparison of UCB versus alternative adaptive policies

Cowan¹,

Katehakis²,

Pirutinsky³

2020

View full text Add to dashboard Cite

Solving Zero-sum Games using Best Response Oracles with Applications to Search Games

Hellerstein¹,

Lidbetter²,

Pirutinsky³

2017

Preprint

View full text Add to dashboard Cite

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Cowan¹,

Katehakis²,

Pirutinsky³

2019

Preprint

View full text Add to dashboard Cite

In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method based on Posterior sampling (MDP-PS).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.