Polynomial Time Algorithms to Find an Approximate Competitive Equilibrium for Chores

Boodaghians, Shant; Chaudhury, Bhaskar Ray; Mehta, Ruta

doi:10.48550/arxiv.2107.06649

Cited by 1 publication

(10 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where ν i is the random policy that assigns the uniform distribution on A i to each s. In words, agent i plays the baseline policy with probability 1 − ρ i , and plays all actions uniformly with probability ρ i /|A i |. We denote by Π the set of joint policies in the form of (10) for each agent, i.e., Π :…”

Section: Weakly Acyclic Games Definition 2 a Policy πmentioning

confidence: 99%

“…Proof of Lemma 2. Note that in the kth exploration phase, agents adopt the joint policy πk as defined in (10). Also by Assumptions 1 and 2, the (finite) Markov chain induced by the joint policy is irreducible and aperiodic.…”

Section: Weakly Acyclic Games Definition 2 a Policy πmentioning

confidence: 99%

“…Our next goal is to bound the approximation error of policy perturbation. Recall the definition of the randomized policy in (10), and consider the joint policies of all agents except i. With probability j =i (1 − ρ j ), all agents j = i end up playing their baseline policies, which results in…”

Section: Weakly Acyclic Games Definition 2 a Policy πmentioning

confidence: 99%

“…The same problem as in the tabular setting arises here: if all agents use (56) and select their actions with the -greedy method, the environment becomes nonstationary and the convergence of the θ i 's is not guaranteed. In the same spirit of Algorithm 1, we let each agent play the behavior policy πi k as defined in (10) during the kth exploration phase, so that the environment is stationary within each exploration phase. Instead of maintaining and updating a |S||A i |-dimensional Q function, agent i updates a d-dimensional vector θ i according to (56).…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…Roughly speaking, in a Markov perfect equilibrium, each agent's policy is a best reply (i.e., maximizes her own total (discounted) reward) to all other agents' joint policy. Mainstreams of research include analyzing the hardness to compute the equilibria (e.g., Daskalakis [17], Daskalakis et al [19], Garg et al [33]), approximating and analyzing the equilibria (e.g., Adsul et al [1], Boodaghians et al [10], Brânzei et al [11]), designing algorithms to find the equilibria with the knowledge of the transitions and rewards (e.g., Hansen et al [35], Hu and Wellman [39]) or without such knowledge (e.g. Arslan and Yüksel [3]).…”

mentioning

confidence: 99%

See 4 more Smart Citations

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Gao¹,

Ma²,

Başar³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning in stochastic games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL). In this paper, we consider decentralized MARL in stochastic games in the non-asymptotic regime. In particular, we establish the finite-sample complexity of fully decentralized Qlearning algorithms in a significant class of general-sum stochastic games (SGs) -weakly acyclic SGs, which includes the common cooperative MARL setting with an identical reward to all agents (a Markov team problem) as a special case. We focus on the practical while challenging setting of fully decentralized MARL, where neither the rewards nor the actions of other agents can be observed by each agent. In fact, each agent is completely oblivious to the presence of other decision makers. Both the tabular and the linear function approximation cases have been considered. In the tabular setting, we analyze the sample complexity for the decentralized Q-learning algorithm to converge to a Markov perfect equilibrium (Nash equilibrium). With linear function approximation, the results are for convergence to a linear approximated equilibrium -a new notion of equilibrium that we propose -which describes that each agent's policy is a best reply (to other agents) within a linear space. Numerical experiments are also provided for both settings to demonstrate the results.

show abstract

Section: Weakly Acyclic Games Definition 2 a Policy πmentioning

confidence: 99%

Section: Weakly Acyclic Games Definition 2 a Policy πmentioning

confidence: 99%

Section: Weakly Acyclic Games Definition 2 a Policy πmentioning

confidence: 99%