We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at each time -determined by all players' current joint action -rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct an online bandit convex optimization algorithm and show that it achieves the single-agent optimal regret of Θ( √ T ) under smooth and strongly-concave payoff functions. We then show that if each agent applies this no-regret learning algorithm in strongly monotone games, the joint action converges in last iterate to the unique Nash equilibrium at a rate of Θ(1/ √ T ). Prior to our work, the best known convergence rate in the same class of games is O(1/T 1/3 ) (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is Ω(1/ √ T )). Our results thus settle this open problem and contribute to the broad landscape of bandit gametheoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present results on several simulation studies -Cournot competition, Kelly auctions and distributed regularized logistic regression -to demonstrate the efficacy of our algorithm.
to find the best possible combination for its campaign via online experimentation. The problem of finding the best audience-ad combination is complicated by a number of distinctive challenges, including (a) a need for active exploration to resolve prior uncertainty and to speed the search for profitable combinations, (b) many combinations to choose from, giving rise to high-dimensional search formulations, and (c) very low success probabilities, typically just a fraction of one percent. Our algorithm (designated LRDL, an acronym for Logistic Regression with Debiased Lasso) addresses these challenges by combining four elemenets: a multiarmed bandit framework for active exploration; a Lasso penalty function to handle high dimensionality;an inbuilt debiasing kernel that handles the regularization bias induced by the Lasso; and a semi-parametric regression model for outcomes that promotes cross-learning across arms. The algorithm is implemented as a Thompson Sampler, and to the best of our knowledge, it is the first that can practically address all of the challenges above. Simulations with real and synthetic data show the method is effective and document its superior performance against several benchmarks from the recent high-dimensional bandit literature.
We study the implications of selling through a voice-based virtual assistant. The seller has a set of products available and the virtual assistant dynamically decides which product to offer in each sequential interaction and at what price. The virtual assistant may maximize the seller's profits; it may be altruistic, maximizing total surplus; or it may serve as a consumer agent maximizing the consumer surplus. The consumer is impatient and rational, seeking to maximize her expected utility given the information available to her. The virtual assistant selects products based on the consumer's request and other information available to it (e.g., consumer profile information) and presents them sequentially. Once a product is presented and priced, the consumer evaluates it and decides whether to make a purchase. The consumer's valuation of each product comprises a pre-evaluation value, which is common knowledge to the consumer and the virtual assistant, and a post-evaluation component which is private to the consumer. We solve for the equilibria and develop efficient algorithms for implementing the solution. In the special case where the private information is exponentially distributed, the profit-maximizing total surplus is distributed equally between the consumer and the seller, and the profit-maximizing ranking also maximizes the consumer surplus. We examine the effects of information asymmetry on the outcomes and study how incentive misalignment depends on the distribution of private valuations. We find that monotone rankings are optimal in the cases of a highly patient or impatient consumer and provide a good approximation for other levels of patience. The relationship between products' expected valuations and prices depends on the consumer's patience level and is monotone increasing (decreasing) when the consumer is highly impatient (patient). Also, the seller's share of total surplus decreases in the amount of private information. We compare the virtual assistant to a traditional web-based interface, where multiple products are presented simultaneously on each page. We find that within a page, the higher-value products are priced lower than the lower-value product when the private valuations are exponentially distributed. This is because increasing one product's valuation increases the matching probability of the other products on the same page, which in turn increases their prices of other products. Finally, the web-based interface generally achieves higher profits for the seller than a virtual assistant due to the greater commitment power inherent in its presentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.