Incentivizing exploration

Frazier, Peter I.; Kempe, David; Kleinberg, Jon; Kleinberg, Robert

doi:10.1145/2600057.2602897

Cited by 94 publications

(89 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our results are related to recent work on incentivizing exploration in a bandit model Frazier et al (2014); Mansour et al ( , 2016. These papers typically model a myopic decision-maker in each round, and an informed non-myopic principle who can influence the decision-maker to explore rather than exploit.…”

Section: Further Related Worksupporting

confidence: 85%

Algorithmic Price Discrimination

Cummings¹,

Devanur²,

Huang³

et al. 2020

Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms

View full text Add to dashboard Cite

We consider a generalization of the third degree price discrimination problem studied in Bergemann et al. (2015), where an intermediary between the buyer and the seller can design market segments to maximize any linear combination of consumer surplus and seller revenue. Unlike in Bergemann et al. (2015), we assume that the intermediary only has partial information about the buyer's value. We consider three different models of information, with increasing order of difficulty. In the first model, we assume that the intermediary's information allows him to construct a probability distribution of the buyer's value. Next we consider the sample complexity model, where we assume that the intermediary only sees samples from this distribution. Finally, we consider a bandit online learning model, where the intermediary can only observe past purchasing decisions of the buyer, rather than her exact value. For each of these models, we present algorithms to compute optimal or near optimal market segmentation. * Clearly, we need certain assumptions on the seller's behavior for any nontrivial result; there is not much we can do if the seller picks prices randomly all the time. Our assumptions can accommodate natural no regret learning algorithms on the seller side, including the Upper-Confidence-Bound (UCB) algorithm and the Explore-then-Commit (ETC) algorithm. Contributions to the Sample Complexity of Mechanism DesignPioneered by Balcan et al. (2005), Elkind (2007), and Dhangwatnotai et al. (2015), and formalized by Cole and Roughgarden (2014), the sample complexity of mechanism design, in particular, the revenue maximization problem, has been a focal point in algorithmic game theory in the last few years Morgenstern and Roughgarden (2015); Balcan et al. (2016); Devanur et al. (2016); Morgenstern and Roughgarden (2016); Hartline and Taggart (2019); Cai and Daskalakis (2017); Gonczarowski and Nisan (2017); Gonczarowski and Weinberg (2018); Huang et al. (2018b); Guo et al. (2019).This paper adds to the literature of sample complexity of mechanism design in two-folds. The first one is conceptual: we formulate the first sample complexity problem from the viewpoint of an intermediary rather than the seller, and for the task of designing information dispersion rather than allocations and payments. We show impossibility results for the general case and, more importantly, identify sufficient conditions under which we derive positive algorithmic results.Conceptually new models often lead to new technical challenges. Our second contribution is an algorithmic ingredient that tackles such a new challenge. Let us start with a thought experiment: consider a more powerful intermediary who knows the true distributions; the seller, however, still acts according to some beliefs formed from the observed samples. Does the problem become trivial? Can the intermediary simply run the optimal segmentation w.r.t. the true distributions and expect near optimal outcomes?The answers turn out to be negative. Consider a segment for which there are two price...

show abstract

Section: Further Related Worksupporting

confidence: 85%

Algorithmic Price Discrimination

Cummings¹,

Devanur²,

Huang³

et al. 2020

Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms

View full text Add to dashboard Cite

show abstract

“…The user can compute the expected cost of travelling along P1 when the recommendation is P2 according to the stationary distribution P πopt of the belief state x. By (11) it is larger than c M , that is,…”

Section: Definition 1 Information Restriction Mechanism (Irm)mentioning

confidence: 99%

Recommending Paths: Follow or Not Follow?

Liu

Courcoubetis

Duan

2019

IEEE INFOCOM 2019 - IEEE Conference on Computer Communications

View full text Add to dashboard Cite

Mobile social network applications constitute an important platform for traffic information sharing, helping users collect and share sensor information about the driving conditions they experience on the traveled path in real time. In this paper we analyse the simple but fundamental model of a platform choosing between two paths: one with known deterministic travel cost and the other that alternates over time between a low and a high random cost states, where the low and the high cost states are only partially observable and perform respectively better and worse on average than the fixed cost path. The more users are routed over the stochastic path, the better the platform can infer its actual state and use it efficiently.At the Nash equilibrium, if asked to take the riskier path, in many cases selfish users (that are allowed to have access to the information collected by the platform) will myopically disregard the optimal path suggestions of the platform, leading to a suboptimal system without enough exploration on the stochastic path. We prove the interesting result that if the past collected information is hidden from users, the system becomes incentive compatible and even 'sophisticated' users (in the sense that they have full capability to reverse-engineer the platform's recommendation and derive the path state distribution conditional on the recommendation) prefer to follow the platform's recommendations. In a more practical setting where the platform implements a model-free Q-learning algorithm to minimise the social travel cost, our analysis suggests that increasing the accuracy of the learning algorithm increases the range of system parameters for which sophisticated users follow the recommendations of the platform, becoming in the limit fully incentive compatible. Finally, we extend the two-path model to include more stochastic paths, and show that incentive compatibility holds under our information restriction mechanism.

show abstract

“…Frazier et al [7] consider a model with monetary transfers, where the social planner can pay agents to explore. Che and Hörner [3] consider a setting with two binary-valued actions and continuous information flow and a continuum of agents.…”

Section: Related Workmentioning

confidence: 99%

“…The planner can induce explorations in many ways. The simplest is using monetary transfers, paying the agents in order to explore (for example, Frazier et al [7]). We are interested in the case when the social planner is unable or prefers to avoid any monetary transfers.…”

Section: Introductionmentioning

confidence: 99%

Optimal Algorithm for Bayesian Incentive-Compatible Exploration

Cohen

Mansour

2019

Proceedings of the 2019 ACM Conference on Economics and Computation

View full text Add to dashboard Cite

IsraelWe consider a social planner faced with a stream of myopic selfish agents. The goal of the social planner is to maximize the social welfare, however, it is limited to using only information asymmetry (regarding previous outcomes) and cannot use any monetary incentives. The planner recommends actions to agents, but her recommendations need to be Bayesian Incentive Compatible to be followed by the agents.Our main result is an optimal algorithm for the planner, in the case that the actions realizations are deterministic and have limited support, making significant important progress on this open problem. Our optimal protocol has two interesting features. First, it always completes the exploration of a priori more beneficial actions before exploring a priori less beneficial actions. Second, the randomization in the protocol is correlated across agents and actions (and not independent at each decision time).(This can be due to regulatory constraints, business model, social norms, or any other reason.) The main advantage of the planner in our model is the information asymmetry, namely, the fact that the planner has much more information than the agents. As a motivating example for information asymmetry, consider a GPS driving application. The application (social planner) is recommending to the drivers (agents) the best route to drive (action), given the changing road delays, and observes the actual road delays when the route is driven. While the application can recommend driving routes, ultimately, the driver decides which route to actually drive. The application needs periodically to send drivers on exploratory routes, where it has uncertainty regarding the actual delay, in order to observe their delay. The driver is aware that the application has updated information regarding the current delays on various roads. For this reason, the driver would be willing to follow the recommendation even if she knows that there is a small probability that she is asked to explore. On the other extreme, if the driver would assume that with high probability a certain recommended route has a higher delay, she might drive an alternate route. This inherent balancing of exploration and exploitation while satisfying agents' incentives, is at the core of this work.The abstract model that we consider is the following. There is a finite set of actions, and for each action there is a prior distribution on its rewards. A social planner is faced with a sequence of myopic selfish agents, and each agent appears only once. The social planner would like to maximize the social welfare, the sum of the agents' utilities. The social planner recommends to each agent an action, and if the recommendation is Bayesian incentive compatible (henceforth, BIC), the agent will follow the action. This model was presented in Kremer et al. [10] and studied in [11][12][13]. The work of Kremer et al. [10] presented an optimal algorithm for the social planner in the case of two actions with deterministic outcome. (Deterministic outcome implies that each time the ...

show abstract

Incentivizing exploration

Cited by 94 publications

References 27 publications

Algorithmic Price Discrimination

Algorithmic Price Discrimination

Recommending Paths: Follow or Not Follow?

Optimal Algorithm for Bayesian Incentive-Compatible Exploration

Contact Info

Product

Resources

About