A framework for learning and planning against switching strategies in repeated games

Hernández-Leal, Pablo; Cote, Enrique Muñoz de; Sucar, L. Enrique

doi:10.1080/09540091.2014.885294

Cited by 13 publications

(15 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4.5). Comparisons are performed with state of the art approaches: two of our previous approaches MDP4.5 [25] and MDP-CL [26]; R-max [7] used as baseline; FAL [18] since it is a fast learning algorithm in repeated games, WOLF-PHC [6] 3 since it can learn non-stationary environments; and the omniscient (perfect) agent that best responds immediately to switches. Results are compared in terms of average utility over the repeated game.…”

Section: Methodsmentioning

confidence: 99%

“…Another approach that uses MDPs to represent opponent strategies is the MDP-CL approach [26]. We introduced MDP-CL in previous work to act against non-stationary opponents (see Algorithm 1).…”

Section: Algorithm 1: Mdp-cl [26]mentioning

confidence: 99%

“…a Boltzmann distribution), can be used for such purpose and will work as DE with the added cost of not efficiently exploring the state space. We present this general version of DE into the MDP-CL framework [26] which yields the MDP-CL(DE) approach, tested experimentally in Sect. 6.3.…”

Section: General Drift Explorationmentioning

confidence: 99%

“…In fact, these two approaches tackle the same problem in different ways and therefore should probably complement each other at the expense of some extra exploration. We call this algorithm R-max# CL, which combines MDP-CL [26] synchronous updates with the asynchronous rapid adaptation of R-max#. The approach of R-max# CL is presented in Algorithm 3.…”

Section: Efficient Drift Exploration With Switch Detectionmentioning

confidence: 99%

“…In the experimental section we compared our proposals against MDP4. 5 [25], MDP-CL [26], R-max [7], FAL [18] and WOLF-PHC [6].…”

Section: Algorithm 1: Mdp-cl [26]mentioning

confidence: 99%

See 4 more Smart Citations

An exploration strategy for non-stationary opponents

Hernández-Leal

Zhan

Taylor

et al. 2016

Auton Agent Multi-Agent Syst

Self Cite

View full text Add to dashboard Cite

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent's objective is to learn a model of the opponent's strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm Most of this work was performed while the first author was a graduate student at INAOE. This paper extends the paper "Exploration strategies to detect strategy switches" presented at the Adaptive Learning Agents workshop [27]. 123Auton Agent Multi-Agent Syst called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Algorithm 1: Mdp-cl [26]mentioning

confidence: 99%

Section: General Drift Explorationmentioning

confidence: 99%

Section: Efficient Drift Exploration With Switch Detectionmentioning

confidence: 99%

“…In the experimental section we compared our proposals against MDP4. 5 [25], MDP-CL [26], R-max [7], FAL [18] and WOLF-PHC [6].…”

Section: Algorithm 1: Mdp-cl [26]mentioning

confidence: 99%