Beyond $\log^2(T)$ Regret for Decentralized Bandits in Matching Markets

Basu, Soumya; Sankararaman, Karthik Abinav; Sankararaman, Abishek

doi:10.48550/arxiv.2103.07501

Cited by 3 publications

(8 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is an emerging line of research on learning stable matchings with bandit feedback (Das and Kamenica, 2005;Liu et al, 2020Liu et al, , 2021Sankararaman et al, 2021;Cen and Shah, 2021;Basu et al, 2021) using the mature tools from the bandit literature. Most of them focus on matchings with non-transferable utilities (Gale and Shapley, 1962), which fails to capture real-world markets with monetary transfers between agents, e.g., payments from passengers to drivers on ride-hailing platforms.…”

Section: Related Workmentioning

confidence: 99%

“…The data streams that arise from digital markets provide opportunities to cope with such challenges, via learning-based mechanism design. Recent work (Jagadeesan et al, 2021;Liu et al, 2021;Sankararaman et al, 2021;Basu et al, 2021) has begun to apply modern machine learning tools to problems in adaptive mechanism design. One particular area of focus in learning-aware market design has been matching, a class of problems central to microeconomics (Mas-Colell et al, 1995).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Min¹,

Wang²,

Xu³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market. At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. Such a setting captures a range of applications including ridesharing platforms. We formalize the problem by proposing a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Min¹,

Wang²,

Xu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…While collisions can be used as communication tools between players in multiplayer bandits [Bistritz and Leshem, 2018, Boursier and Perchet, 2019, Mehrabian et al, 2020, Wang et al, 2020, this becomes harder with an asymmetric colli-sion model as in competing bandits. However, some level of communication remains possible [Sankararaman et al, 2020, Basu et al, 2021. In queuing systems, collisions are not only asymmetric, but depend on the age of the sent packets, making such solutions unsuited.…”

Section: Additional Related Workmentioning

confidence: 99%

Decentralized Learning in Online Queuing Systems

Sentenac¹,

Boursier²,

Perchet³

2021

Preprint

View full text Add to dashboard Cite

Motivated by packet routing in computer networks, online queuing systems are composed of queues receiving packets at different rates. Repeatedly, they send packets to servers, each of them treating only at most one packet at a time. In the centralized case, the number of accumulated packets remains bounded (i.e., the system is stable) as long as the ratio between service rates and arrival rates is larger than 1. In the decentralized case, individual no-regret strategies ensures stability when this ratio is larger than 2. Yet, myopically minimizing regret disregards the long term effects due to the carryover of packets to further rounds. On the other hand, minimizing long term costs leads to stable Nash equilibria as soon as the ratio exceeds e e−1 . Stability with decentralized learning strategies with a ratio below 2 was a major remaining question. We first argue that for ratios up to 2, cooperation is required for stability of learning strategies, as selfish minimization of policy regret, a patient notion of regret, might indeed still be unstable in this case. We therefore consider cooperative queues and propose the first learning decentralized algorithm guaranteeing stability of the system as long as the ratio of rates is larger than 1, thus reaching performances comparable to centralized strategies. * Equal contributions Preprint. Under review.

show abstract

“…If, at a given round, multiple agents request the same firm, the firm-assumed to be a myopic utility maximizer-accepts the request of its most preferred agent (who receives a noisy measurement of their utility of the match from which they can learn their preferences) and rejects the others (who receive no information about their preferences). This setup serves has been studied in a line of recent works on online matching markets [LMJ20,LRMJ21,SBS21,BSS21].…”

Section: Introductionmentioning

confidence: 99%

“…Successful algorithms for this framework must simultaneously solve a statistical learning problem (that of learning about their own preferences) and a competitive problem (ensuring that agents get their most desired match despite the presence of other self-interested agents in the market). Previous works for addressing this problem propose algorithms that are centralized [LMJ20] (whereby agents send their current beliefs over their preferences to a central platform which does the matching), require coordination between agents (i.e., a choreographed set of strategies to minimize rejections) [SBS21,BSS21], or require agents to fully observe the market outcomes of other agents [LRMJ21]. In contrast, the DA algorithm-which we take to be the full-information benchmark to which we compare algorithms-is (i) fully decentralized, (ii) coordination-free, and (iii) requires agents to make decisions only based upon their own history of rejections and successful matchings.…”

Section: Introductionmentioning

confidence: 99%

Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets

Maheshwari¹,

Mazumdar²,

Sastry³

2022

Preprint

View full text Add to dashboard Cite

We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication-and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets. In contrast to prior works, the proposed algorithms make decisions based solely on an agent's own history of play and requires no foreknowledge of the firms' preferences. Our algorithms are constructed by splitting up the statistical problem of learning one's preferences, from noisy observations, from the problem of competing for firms. We show that under realistic structural assumptions on the underlying preferences of the agents and firms, the proposed algorithms incur a regret which grows at most logarithmically in the time horizon. Our results show that, in the case of matching markets, competition need not drastically affect the performance of decentralized, communication and coordination free online learning algorithms.

show abstract

Beyond $\log^2(T)$ Regret for Decentralized Bandits in Matching Markets

Cited by 3 publications

References 19 publications

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets

Decentralized Learning in Online Queuing Systems

Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets

Contact Info

Product

Resources

About