Reinforcement Learning Method for Ad Networks Ordering in Real-Time Bidding

Afshar, Reza Refaei; Zhang, Yingqian; Fırat, Murat; Kaymak, Uzay

doi:10.1007/978-3-030-37494-5_2

Cited by 4 publications

(3 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Advancements in DRL approaches in recent years have enabled considerable progress for the domain of COP applications [Cappart et al, 2021, Oren et al, 2021. Some of the major COPs have been successfully solved using DRL such as the Travelling Salesman Problem (TSP) [Zhang et al, 2021, d O Costa et al, 2020, Zhang et al, 2020b, the Knap Sack Problem [Afshar et al, 2020, Cappart et al, 2021 and the Steiner Tree Problem [Du et al, 2021]. Zhang and Dietterich [1995] were able to show the potential of Reinforcement Learning (RL) for JSSPs as far back as 1995, by improving the results of the scheduling algorithm by Deale et al [1994] which used a temporal difference algorithm in combination with simulated annealing.…”

Section: Related Workmentioning

confidence: 99%

“…The implementation of DRL in the field of Operational Research (OR) has become quite significant. Several studies incorporating DRL to solve COP have shown promising results [Du et al, 2021, Afshar et al, 2020. Moreover, DRL provides a significantly faster approximation for COPs compared to exhaustive search, metaheuristics, or other conventional heuristics.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Reinforcement Learning Approach for Scheduling Problems With Improved Generalization Through Order Swapping

Vivekanandan¹,

Wirth²,

Karlbauer³

et al. 2023

Preprint

View full text Add to dashboard Cite

The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy but also for increasing the overall efficiency. Among the different job scheduling problems, the Job Shop Scheduling Problem (JSSP) is addressed in this work. JSSP falls into the category of NP-hard Combinatorial Optimization Problem (COP), in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as First In, First Out (FIFO), Largest Processing Time First (LPT) and metaheuristics such as Taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using Deep Reinforcement Learning (DRL) to solve COPs has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the Proximal Policy Optimization (PPO) algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated an Order Swapping Mechanism (OSM) in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Reinforcement Learning Approach for Scheduling Problems With Improved Generalization Through Order Swapping

Vivekanandan¹,

Wirth²,

Karlbauer³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Preliminary analysis on the relations between features like floor price and the revenue are discussed in [27]. We Fig.…”

Section: Data Descriptionmentioning

confidence: 99%

Dynamic Ad Network Ordering Method Using Reinforcement Learning

Afshar

Zhang

Kaymak

2022

Int J Comput Intell Syst

Self Cite

View full text Add to dashboard Cite

Real time bidding is one of the most popular ways of selling impressions in online advertising, where online ad publishers allocate some blocks in their websites to sell in online auctions. In real time bidding, ad networks connect publishers and advertisers. There are many available ad networks for publishers to choose from. A possible approach for selecting ad networks and sending ad requests is called Waterfall Strategy, in which ad networks are selected sequentially. The ordering of the ad networks is very important for publishers, and finding the ordering that will provide maximum revenue is a hard problem due to the highly dynamic environment. In this paper, we propose a dynamic ad network ordering method to find the best ordering of ad networks for publishers that opt for Waterfall Strategy to select ad networks. This method consists of two steps. The first step is a prediction model that is trained on real time bidding historical data and provides an estimation of revenue for each impression. These estimations are used as initial values for the Q-table in the second step. The second step is based on Reinforcement Learning and improves the output of the prediction model. By calculating the revenue of our method and comparing that with the revenue of a fixed and predefined ordering method, we show that our proposed dynamic ad network ordering method increases publishers’ revenue.

show abstract