Proceedings of the 24th International Conference on World Wide Web 2015
DOI: 10.1145/2740908.2741998
|View full text |Cite
|
Sign up to set email alerts
|

Ad Recommendation Systems for Life-Time Value Optimization

Abstract: The main objective in the ad recommendation problem is to find a strategy that, for each visitor of the website, selects the ad that has the highest probability of being clicked. This strategy could be computed using supervised learning or contextual bandit algorithms, which treat two visits of the same user as two separate independent visitors, and thus, optimize greedily for a single step into the future. Another approach would be to use reinforcement learning (RL) methods, which differentiate between two vi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
60
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 82 publications
(60 citation statements)
references
References 13 publications
0
60
0
Order By: Relevance
“…In recent years, there have been a series of widely noted successful applications of deep RL approaches (e.g., AlphaGo [23]), demonstrating their ability to better understand the environment, and enabling them to infer high-level causal relationships. There have been attempts to invoke RL in recommender systems in a non-KG setting, such as for ads recommendation [25], news recommendation [35] and post-hoc explainable recommendation [27]. At the same time, researchers have also explored RL in KG settings for other tasks such as question answering (QA) [3,14,29], which formulates multi-hop reasoning as a sequential decision making problem.…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…In recent years, there have been a series of widely noted successful applications of deep RL approaches (e.g., AlphaGo [23]), demonstrating their ability to better understand the environment, and enabling them to infer high-level causal relationships. There have been attempts to invoke RL in recommender systems in a non-KG setting, such as for ads recommendation [25], news recommendation [35] and post-hoc explainable recommendation [27]. At the same time, researchers have also explored RL in KG settings for other tasks such as question answering (QA) [3,14,29], which formulates multi-hop reasoning as a sequential decision making problem.…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…Firstly, techniques have been developed to estimate the performance of deploying a particular RL model prior to deployment. This helps in communicating risks and benefits of RL solutions with stakeholders and moves RL further into the realm of feasible technologies for high-impact application domains [200]. For single-step decision making problems, contextual bandit algorithms with theoretical bounds on decision-theoretic regret have become available.…”
Section: Den Hengst Et Al / Reinforcement Learning For Personalizatimentioning
confidence: 99%
“…Nevertheless, a practical solution can be employed to benefit from the results in this study. The solution named, "off-policy evaluation framework" [51], keeps track of the best performing policy. As we are evaluating the policy that achieves maximum average data rates among the learned policies, we can benefit from the results in this study by designing the algorithm such that the off-policy evaluation framework is performed.…”
Section: ) Resultsmentioning
confidence: 99%