We initiate the study of efficient mechanism design with guaranteed good properties even when players participate in multiple different mechanisms simultaneously or sequentially. We define the class of smooth mechanisms, related to smooth games defined by Roughgarden, that can be thought of as mechanisms that generate approximately market clearing prices. We show that smooth mechanisms result in high quality outcome in equilibrium both in the full information setting and in the Bayesian setting with uncertainty about participants, as well as in learning outcomes. Our main result is to show that such mechanisms compose well: smoothness locally at each mechanism implies efficiency globally.For mechanisms where good performance requires that bidders do not bid above their value, we identify the notion of a weakly smooth mechanism. Weakly smooth mechanisms, such as the Vickrey auction, are approximately efficient under the no-overbidding assumption. Similar to smooth mechanisms, weakly smooth mechanisms behave well in composition, and have high quality outcome in equilibrium (assuming no overbidding) both in the full information setting and in the Bayesian setting, as well as in learning outcomes.In most of the paper we assume participants have quasilinear valuations. We also extend some of our results to settings where participants have budget constraints.
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual regret decays at O(T −3/4 ), while the sum of utilities converges to an approximate optimum at O(T −1 )-an improvement upon the worst case O(T −1/2 ) rates. We show a blackbox reduction for any algorithm in the class to achieveÕ(T −1/2 ) rates against an adversary, while maintaining the faster rates against algorithms in the class. Our results extend those of Rakhlin and Shridharan [18] and Daskalakis et al. [4], who only analyzed two-player zero-sum games for specific algorithms.
Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in future decision makers. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as elsewhere, such as medical decisions. Each decision maker when required to select an action, would individually prefer to exploit, select the highest expected reward action conditional on her information. At the same time, each decision maker would prefer previous decision makers to explore, producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation, and maximize social welfare.We formulate this problem as a multi-arm bandit problem (and various generalizations thereof) under incentive-compatibility constraints induced by agents' Bayesian priors. We design an incentive-compatible bandit algorithm for the social planner with asymptotically optimal regret. Further, we provide a blackbox reduction from an arbitrary multi-arm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit settings, even ones that incorporate contexts and arbitrary partial feedback.
The main goal of this paper is to develop a theory of inference of player valuations from observed data in the generalized second price auction without relying on the Nash equilibrium assumption. Existing work in Economics on inferring agent values from data relies on the assumption that all participant strategies are best responses of the observed play of other players, i.e. they constitute a Nash equilibrium. In this paper, we show how to perform inference relying on a weaker assumption instead: assuming that players are using some form of no-regret learning. Learning outcomes emerged in recent years as an attractive alternative to Nash equilibrium in analyzing game outcomes, modeling players who haven't reached a stable equilibrium, but rather use algorithmic learning, aiming to learn the best way to play from previous observations. In this paper we show how to infer values of players who use algorithmic learning strategies. Such inference is an important first step before we move to testing any learning theoretic behavioral model on auction data. We apply our techniques to a dataset from Microsoft's sponsored search ad auction system.
We study the quality of outcomes in repeated games when the population of players is dynamically changing, and where participants use learning algorithms to adapt to the dynamic environment. Price of anarchy has originally been introduced to study the Nash equilibria of one-shot games. Many games studied in computer science, such as packet routing or ad-auctions, are played repeatedly. Given the computational hardness of Nash equilibria, an attractive alternative in repeated game settings is that players use no-regret learning algorithms. The price of total anarchy considers the quality of such learning outcomes, assuming a steady environment and player population, which is rarely the case in online settings.In this paper we analyze efficiency of repeated games in dynamically changing environments. An important trait of learning behavior is its versatility to changing environments, assuming that the learning method used is adaptive, i.e., doesn't rely too heavily on experience from the distant past. We show that, in large classes of games, if players choose their strategies in a way that guarantees low adaptive regret, high social welfare is ensured, even under very frequent changes.A main technical tool for our analysis is the existence of a solution to the welfare maximization problem that is both close to optimal and relatively stable over time. Such a solution serves as a benchmark in the efficiency analysis of learning outcomes. We show that such a stable and close to optimal solution exists for many problems, even in cases when the exact optimal solution can be very unstable. We further show that a sufficient condition on the existence of stable outcomes is the existence of a differentially private algorithm for
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.