Learning to Bid Without Knowing your Value

Feng, Zhe; Podimata, Chara; Syrgkanis, Vasilis

doi:10.1145/3219166.3219208

Cited by 36 publications

(34 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We run UCBid1+ on discrete examples. In this case, we compare it to UCB on a discretization of [0, 1] and to WinExp, a generalization of Exp3 for the problem of learning to bid [Feng et al, 2018]. In this section we focus on two particular instances of the first price auction learning problem.…”

Section: Methods For Discrete Distributionsmentioning

confidence: 99%

“…Han et al [2020] provide new algorithms for this setting which have a regret of the order of √ T . A setting somewhat closer to ours is studied by Feng et al [2018]. This work deals with the setting of a bid in an adversarial fashion, when the other bids are revealed at each time step and the value is revealed only upon winning an auction.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fast Rate Learning in Stochastic First Price Bidding

Achddou¹,

Cappé²,

Garivier³

2021

Preprint

View full text Add to dashboard Cite

First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. As far as learning is concerned, firstprice auctions are more challenging because the optimal bidding strategy does not only depend on the value of the item but also requires some knowledge of the other bids. They have already given rise to several works in sequential learning, many of which consider models for which the value of the buyer or the opponents' maximal bid is chosen in an adversarial manner. Even in the simplest settings, this gives rise to algorithms whose regret grows as √ T with respect to the time horizon T . Focusing on the case where the buyer plays against a stationary stochastic environment, we show how to achieve significantly lower regret: when the opponents' maximal bid distribution is known we provide an algorithm whose regret can be as low as log 2 (T ); in the case where the distribution must be learnt sequentially, a generalization of this algorithm can achieve T 1/3+ regret, for any > 0. To obtain these results, we introduce two novel ideas that can be of interest in their own right. First, by tranposing results obtained in the posted price setting, we provide conditions under which the first-price biding utility is locally quadratic around its optimum. Second, we leverage the observation that, on small subintervals, the concentration of the variations of the empirical distribution function may be controlled more accurately than by using the classical Dvoretzky-Kiefer-Wolfowitz inequality. Numerical simulations confirm that our algorithms converge much faster than alternatives proposed in the literature for various bid distributions, including for bids collected on an actual programmatic advertising platform.

show abstract

Section: Methods For Discrete Distributionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Fast Rate Learning in Stochastic First Price Bidding

Achddou¹,

Cappé²,

Garivier³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Online learning in auctions Many works on online learning in auctions are about "learning to bid", focusing on how to design no-regret algorithms for a bidder to bid in various formats of repeated auctions, including first price auctions (Balseiro et al, 2019;Han et al, 2020), second price auctions (Iyer et al, 2014;Weed et al, 2016), and more general auctions (Feng et al, 2018;Karaca et al, 2020). These works take the perspective of a single bidder, without considering the interaction among multiple bidders all of whom learn to bid at the same time.…”

Section: Related Workmentioning

confidence: 99%

Nash Convergence of Mean-Based Learning Algorithms in First Price Auctions

Deng¹,

Xiong²,

Lin³

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider repeated first price auctions where each bidder, having a deterministic type, learns to bid using a mean-based learning algorithm. We completely characterize the Nash convergence property of the bidding dynamics in two senses: (1) time-average: the fraction of rounds where bidders play a Nash equilibrium approaches to 1 in the limit; (2) last-iterate: the mixed strategy profile of bidders approaches to a Nash equilibrium in the limit. Specifically, the results depend on the number of bidders with the highest value:• If the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in time-average and in last-iterate.• If the number is two, the bidding dynamics almost surely converges to a Nash equilibrium in time-average but not necessarily in last-iterate.• If the number is one, the bidding dynamics may not converge to a Nash equilibrium in time-average nor in last-iterate.Our discovery opens up new possibilities in the study of convergence dynamics of learning algorithms.

show abstract

“…a uniform interval [0, 1]). Utilizing the discretization result in [16,15,12], let B be the discretization of continuous bid space B and DE(B, B) to represent the discretization error of bid space B and B such that In practice, the expected allocation function g * and payment function p * are both relatively smooth (see e.g.. the plots in [18]). Assume the Lipschitzness of expected allocation function and payment function, the discretization error DE(B, B) can be easily bounded, then our Pseudo-Regret analysis can be directly applied in this continuous bid space.…”

Section: Continuous Bids Spacementioning

confidence: 99%

Online Learning for Measuring Incentive Compatibility in Ad Auctions?

Feng

Schrijvers

Sodomka

2019

The World Wide Web Conference

Self Cite

View full text Add to dashboard Cite

In this paper we investigate the problem of measuring end-to-end Incentive Compatibility (IC) regret given black-box access to an auction mechanism. Our goal is to 1) compute an estimate for IC regret in an auction, 2) provide a measure of certainty around the estimate of IC regret, and 3) minimize the time it takes to arrive at an accurate estimate. We consider two main problems, with different informational assumptions: In the advertiser problem the goal is to measure IC regret for some known valuation v, while in the more general demandside platform (DSP) problem we wish to determine the worst-case IC regret over all possible valuations. The problems are naturally phrased in an online learning model and we design Regret-UCB algorithms for both problems. We give an online learning algorithm where for the advertiser problem the error of determining IC shrinks as O |B| T · ln T n + ln T n (where B is the finite set of bids, T is the number of time steps, and n is number of auctions per time step), and for the DSP problem it shrinks as O |B| T · |B| ln T n + |B| ln T n . For the DSP problem, we also consider stronger IC regret estimation and extend our Regret-UCB algorithm to achieve better IC regret error. We validate the theoretical results using simulations with Generalized Second Price (GSP) auctions, which are known to not be incentive compatible and thus have strictly positive IC regret.

show abstract

Learning to Bid Without Knowing your Value

Cited by 36 publications

References 31 publications

Fast Rate Learning in Stochastic First Price Bidding

Fast Rate Learning in Stochastic First Price Bidding

Nash Convergence of Mean-Based Learning Algorithms in First Price Auctions

Online Learning for Measuring Incentive Compatibility in Ad Auctions?

Contact Info

Product

Resources

About