2017
DOI: 10.1007/978-3-319-69179-4_50
|View full text |Cite
|
Sign up to set email alerts
|

Improving Real-Time Bidding Using a Constrained Markov Decision Process

Abstract: Abstract. Online advertising is increasingly switching to real-time bidding on advertisement inventory, in which the ad slots are sold through real-time auctions upon users visiting websites or using mobile apps. To compete with unknown bidders in such a highly stochastic environment, each bidder is required to estimate the value of each impression and to set a competitive bid price. Previous bidding algorithms have done so without considering the constraint of budget limits, which we address in this paper. We… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 15 publications
0
17
0
Order By: Relevance
“…The performance of their methodology was also evaluated using ten dynamic CPM campaigns and the increment in performance concerning conversions (CPA) and the number of clicks (CPC) was 30.9% and 19.0% respectively. Also relevant in this context is the publication of Do et al [28] in which they improved the performance of the RTB through a Constrained Markov Decision Process (CMDP) based on a reinforcement learning framework. A distributed representation model is used to estimate the CTR value where the estimated CTR is the state, and the price of the action and the clicks are the reward.…”
Section: Performance Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…The performance of their methodology was also evaluated using ten dynamic CPM campaigns and the increment in performance concerning conversions (CPA) and the number of clicks (CPC) was 30.9% and 19.0% respectively. Also relevant in this context is the publication of Do et al [28] in which they improved the performance of the RTB through a Constrained Markov Decision Process (CMDP) based on a reinforcement learning framework. A distributed representation model is used to estimate the CTR value where the estimated CTR is the state, and the price of the action and the clicks are the reward.…”
Section: Performance Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…α log π φ (â t |s t ) is the entropy term, and the temperature parameter is automatically adjusted by formula (16) to control the stochasticity of the optimal strategy. So we can update the Policy network's parameters by using unbiased gradient estimator proposed in SAC, as shown in formula (14). It is worth noting that the agent chooses the smaller Q value to update the Policy network to avoid overestimation.…”
Section: Solution Based On Sacmentioning
confidence: 99%
“…Furthermore, we model the adjustment factor decisions of ad impressions in an ad delivery period as an MDP [14]. Therefore, the RL agent's task is to learn the optimal adjustment factor generation policy.…”
Section: Introductionmentioning
confidence: 99%
“…CMDPs can be used to model a wide variety of different real problems. For instance, it can be used to maximize the revenue on online advertising, while considering the constraint of budget limits (Du et al, 2017), or, in robot control, to maximize the probability of reaching a target location within a temporal deadline (Carpin et al, 2014). As explained in Section 5, CMDPs is also used to model the problem considered in this paper.…”
Section: Background On Reinforcement Learningmentioning
confidence: 99%