2007
DOI: 10.1007/s10458-007-9013-x
|View full text |Cite
|
Sign up to set email alerts
|

Generalized multiagent learning with performance bound

Abstract: We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and "other" (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the diffic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(20 citation statements)
references
References 16 publications
0
20
0
Order By: Relevance
“…Several MARL algorithms have been proposed and studied [5,11,18,22], all of which have some theoretical results of convergence in general-sum games. A common assumption of these algorithms is that an agent (or player) knows its own payoff matrix.…”
Section: Policy Learning Using Pga-appmentioning
confidence: 99%
See 1 more Smart Citation
“…Several MARL algorithms have been proposed and studied [5,11,18,22], all of which have some theoretical results of convergence in general-sum games. A common assumption of these algorithms is that an agent (or player) knows its own payoff matrix.…”
Section: Policy Learning Using Pga-appmentioning
confidence: 99%
“…A common assumption of these algorithms is that an agent (or player) knows its own payoff matrix. To guarantee convergence, each algorithm has it own additional assumptions, such as requiring an agent to know a Nash Equilibrium and the strategy of the other players [5,11,18], or observe what actions other agents executed and what rewards they received [18,22]. For practical applications, these assumptions are very constraining and unlikely to hold, and, instead, an agent can only observe the immediate reward after selecting and performing an action.…”
Section: Policy Learning Using Pga-appmentioning
confidence: 99%
“…In a recent work, Chakraborty and Sen [21] proposed modeling the learning environments induced by gradient ascent learners (WoLF-IGA, WoLF-PHC [7] and ReDVaLeR [6]) as MDPs. In the presence of a gradient ascent adversary, the learning algorithm, called MB-AIM-FSI, first creates a set of hypotheses about the model of the learning environment that can be induced by the learning adversary.…”
Section: Related Workmentioning
confidence: 99%
“…Certain approaches also require the observability of the actions made by the other agents [2,3]. Finally, in the less realistic settings, the strategies (i.e., the probability distributions over actions) or the rewards obtained by the other agents are also assumed to be observable [4][5][6].…”
Section: Introductionmentioning
confidence: 99%
“…The Multi-Agent Reinforcement Learning (MARL) in [5] provides a common approach for solving multi-objects decision-making problems that allows objects to dynamically adapt to changes in the IoT environment. Several MARL algorithms have been proposed in [4,[6][7][8], all of which have some theoretical results of convergence in general-sum games. A common assumption of these algorithms is that an player knows its own payoff matrix.…”
Section: Introductionmentioning
confidence: 99%