Improving Real-Time Bidding Using a Constrained Markov Decision Process

Du, Manxing; Sassioui, Redouane; Varisteas, Georgios; State, Radu; Brorsson, Mats; Cherkaoui, Omar

doi:10.1007/978-3-319-69179-4_50

Cited by 16 publications

(17 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of their methodology was also evaluated using ten dynamic CPM campaigns and the increment in performance concerning conversions (CPA) and the number of clicks (CPC) was 30.9% and 19.0% respectively. Also relevant in this context is the publication of Do et al [28] in which they improved the performance of the RTB through a Constrained Markov Decision Process (CMDP) based on a reinforcement learning framework. A distributed representation model is used to estimate the CTR value where the estimated CTR is the state, and the price of the action and the clicks are the reward.…”

Section: Performance Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Real-time bidding campaigns optimization using user profile settings

Miralles-Pechuán

Qureshi

Namee

2021

Electron Commer Res

View full text Add to dashboard Cite

Real-Time bidding is nowadays one of the most promising systems in the online advertising ecosystem. In the presented study, the performance of RTB campaigns is improved by optimising the parameters of the users’ profiles and the publishers’ websites. Most studies about optimising RTB campaigns are focused on the bidding strategy; estimating the best value for each bid. However, our research is focused on optimising RTB campaigns by finding out configurations that maximise both the number of impressions and the average profitability of the visits. An online campaign configuration generally consists of a set of parameters along with their values such as {Browser = “Chrome”, Country = “Germany”, Age = “20–40” and Gender = “Woman”}. The experiments show that, when the number of required visits by advertisers is low, it is easy to find configurations with high average profitability, but as the required number of visits increases, the average profitability diminishes. Additionally, configuration optimisation has been combined with other interesting strategies to increase, even more, the campaigns’ profitability. In particular, the presented study considers the following complementary strategies to increase profitability: (1) selecting multiple configurations with a small number of visits rather than a unique configuration with a large number of visits, (2) discarding visits according to certain cost and profitability thresholds, (3) analysing a reduced space of the dataset and extrapolating the solution over the whole dataset, and (4) increasing the search space by including solutions below the required number of visits. The developed campaign optimisation methodology could be offered by RTB and other advertising platforms to advertisers to make their campaigns more profitable.

show abstract

Section: Performance Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Real-time bidding campaigns optimization using user profile settings

Miralles-Pechuán

Qureshi

Namee

2021

Electron Commer Res

View full text Add to dashboard Cite

show abstract

“…α log π φ (â t |s t ) is the entropy term, and the temperature parameter is automatically adjusted by formula (16) to control the stochasticity of the optimal strategy. So we can update the Policy network's parameters by using unbiased gradient estimator proposed in SAC, as shown in formula (14). It is worth noting that the agent chooses the smaller Q value to update the Policy network to avoid overestimation.…”

Section: Solution Based On Sacmentioning

confidence: 99%

“…Furthermore, we model the adjustment factor decisions of ad impressions in an ad delivery period as an MDP [14]. Therefore, the RL agent's task is to learn the optimal adjustment factor generation policy.…”

Section: Introductionmentioning

confidence: 99%

Bid Optimization using Maximum Entropy Reinforcement Learning

Liu¹,

Liu²,

Hu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Real-time bidding (RTB) has become a critical way of online advertising. In RTB, an advertiser can participate in bidding ad impressions to display its advertisements. The advertiser determines every impression's bidding price according to its bidding strategy. Therefore, a good bidding strategy can help advertisers improve cost efficiency. This paper focuses on optimizing a single advertiser's bidding strategy using reinforcement learning (RL) in RTB. Unfortunately, it is challenging to optimize the bidding strategy through RL at the granularity of impression due to the highly dynamic nature of the RTB environment. In this paper, we first utilize a widely accepted linear bidding function to compute every impression's base price and optimize it by a mutable adjustment factor derived from the RTB auction environment, to avoid optimizing every impression's bidding price directly. Specifically, we use the maximum entropy RL algorithm (Soft Actor-Critic) to optimize the adjustment factor generation policy at the impression-grained level. Finally, the empirical study on a public dataset demonstrates that the proposed bidding strategy has superior performance compared with the baselines.

show abstract

“…CMDPs can be used to model a wide variety of different real problems. For instance, it can be used to maximize the revenue on online advertising, while considering the constraint of budget limits (Du et al, 2017), or, in robot control, to maximize the probability of reaching a target location within a temporal deadline (Carpin et al, 2014). As explained in Section 5, CMDPs is also used to model the problem considered in this paper.…”

Section: Background On Reinforcement Learningmentioning

confidence: 99%

Reinforcement learning for pricing strategy optimization in the insurance industry

Krasheninnikova

Garcı́a

Maestre

et al. 2019

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

Pricing is a fundamental problem in the banking sector, and is closely related to a number of financial products such as credit scoring or insurance. In the insurance industry an important question arises, namely: how can insurance renewal prices be adjusted? Such an adjustment has two conflicting objectives. On the one hand, insurers are forced to retain existing customers, while on the other hand insurers are also forced to increase revenue. Intuitively, one might assume that revenue increases by offering high renewal prices, however this might also cause many customers to terminate their contracts. Contrarily, low renewal prices help retain most existing customers, but could negatively affect revenue. Therefore, adjusting renewal prices is a non-trivial problem for the insurance industry. In this paper, we propose a novel modelization of the renewal price adjustment problem as a sequential decision problem and, consequently, as a Markov decision process (MDP). In particular, this study analyzes two different strategies to carry out this adjustment. The first is about maximizing revenue analyzing the effect of this maximization on customer retention, while the second is about maximizing revenue subject to the client retention level not falling below a given threshold. The former case is related to MDPs with a single criterion to be optimized. The latter case is related to Constrained MDPs (CMDPs) with two criteria, where the first one is related to optimization, while the second is subject to a constraint. This paper also contributes with the resolution of these models by means of a modelfree Reinforcement Learning algorithm. Results have been reported using real data from the insurance division of BBVA, one of the largest Spanish companies in the banking sector.

show abstract

Improving Real-Time Bidding Using a Constrained Markov Decision Process

Cited by 16 publications

References 15 publications

Real-time bidding campaigns optimization using user profile settings

Real-time bidding campaigns optimization using user profile settings

Bid Optimization using Maximum Entropy Reinforcement Learning

Reinforcement learning for pricing strategy optimization in the insurance industry

Contact Info

Product

Resources

About