Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising

Jin, Junhua; Song, Chengru; Li, Han; Gai, Kun; Wang, Jun; Zhang, Weinan

doi:10.1145/3269206.3272021

Cited by 120 publications

(73 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the platform, advertisers bid on plenty of granularities like ad clusters, items, shops, etc. Several simultaneously running recommendation approaches in all granularities produce candidate sets and the combination of them are passed to subsequent stages, like CTR prediction [32,31,23], ranking [33,13], etc. The comparison baseline is such a combination of all running recommendation methods.…”

Section: Online Resultsmentioning

confidence: 99%

Learning Tree-based Deep Model for Recommender Systems

Zhu

Zhang

et al. 2018

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

258

157

View full text Add to dashboard Cite

Large-scale industrial recommender systems are usually confronted with computational problems due to the enormous corpus size. To retrieve and recommend the most relevant items to users under response time limits, resorting to an efficient index structure is an effective and practical solution. The previous work Tree-based Deep Model (TDM) [34] greatly improves recommendation accuracy using tree index. By indexing items in a tree hierarchy and training a user-node preference prediction model satisfying a max-heap like property in the tree, TDM provides logarithmic computational complexity w.r.t. the corpus size, enabling the use of arbitrary advanced models in candidate retrieval and recommendation. In tree-based recommendation methods, the quality of both the tree index and the user-node preference prediction model determines the recommendation accuracy for the most part. We argue that the learning of tree index and preference model has interdependence. Our purpose, in this paper, is to develop a method to jointly learn the index structure and user preference prediction model. In our proposed joint optimization framework, the learning of index and user preference prediction model are carried out under a unified performance measure. Besides, we come up with a novel hierarchical user preference representation utilizing the tree index hierarchy. Experimental evaluations with two large-scale real-world datasets show that the proposed method improves recommendation accuracy significantly.

show abstract

Section: Online Resultsmentioning

confidence: 99%

Learning Tree-based Deep Model for Recommender Systems

Zhu

Zhang

et al. 2018

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

258

157

View full text Add to dashboard Cite

show abstract

“…Wang et al [19] utilized deep Q network (DQN) to optimize the bidding strategy in DSP. Jin et al [11] formulated bidding optimization with multiagent reinforcement learning to balance the trade-o between the competition and cooperation among advertisers.…”

Section: Rl Methods For Bidding Strategiesmentioning

confidence: 99%

“…is scale factor therefore is used to optimize advertisers' bidding strategies in some researches [11,25]. Recommendation.…”

Section: Evaluation Platformmentioning

confidence: 99%

Learning to Advertise for Organic Traffic Maximization in E-Commerce Product Feeds

Chen

Jin

Zhang

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

Most e-commerce product feeds provide blended results of advertised products and recommended products to consumers. e underlying advertising and recommendation platforms share similar if not exactly the same set of candidate products. Consumers behaviors on the advertised results constitute part of the recommendation model's training data and therefore can in uence the recommended results. We refer to this process as Leverage. Considering this mechanism, we propose a novel perspective that advertisers can strategically bid through the advertising platform to optimize their recommended organic tra c. By analyzing the realworld data, we rst explain the principles of Leverage mechanism, i.e., the dynamic models of Leverage. en we introduce a novel Leverage optimization problem and formulate it with a Markov Decision Process. To deal with the sample complexity challenge in model-free reinforcement learning, we propose a novel Hybrid Training Leverage Bidding (HTLB) algorithm which combines the real-world samples and the emulator-generated samples to boost the learning speed and stability. Our o ine experiments as well as the results from the online deployment demonstrate the superior performance of our approach.

show abstract

“…The previous name is: Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. successful applications of DRL techniques to optimize the decisionmaking process in E-commerce from different aspects including online recommendation [11], impression allocation [10,41], advertising bidding strategies [19,37,40] and product ranking [16].…”

Section: Introductionmentioning

confidence: 99%

“…In traditional online advertising, the ad positions are fixed, and we only need to determine which ads to be shown in these positions for each user request [26]. This can be modeled as an ads position bidding problem and DRL techniques have been shown to be effective in learning bidding strategies for advertisers [19,37,40]. However, fixing ad positions limit the flexibility of the advertising system.…”

Section: Introductionmentioning

confidence: 99%

Learning Adaptive Display Exposure for Real-Time Advertising

Wang

Jin

Hao

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: requestlevel constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism.

show abstract

Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising

Cited by 120 publications

References 23 publications

Learning Tree-based Deep Model for Recommender Systems

Learning Tree-based Deep Model for Recommender Systems

Learning to Advertise for Organic Traffic Maximization in E-Commerce Product Feeds

Learning Adaptive Display Exposure for Real-Time Advertising

Contact Info

Product

Resources

About