Ad Recommendation Systems for Life-Time Value Optimization

Theocharous, Georgios; Thomas, Philip S.; Ghavamzadeh, Mohammad

doi:10.1145/2740908.2741998

Cited by 82 publications

(60 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, there have been a series of widely noted successful applications of deep RL approaches (e.g., AlphaGo [23]), demonstrating their ability to better understand the environment, and enabling them to infer high-level causal relationships. There have been attempts to invoke RL in recommender systems in a non-KG setting, such as for ads recommendation [25], news recommendation [35] and post-hoc explainable recommendation [27]. At the same time, researchers have also explored RL in KG settings for other tasks such as question answering (QA) [3,14,29], which formulates multi-hop reasoning as a sequential decision making problem.…”

Section: Reinforcement Learningmentioning

confidence: 99%

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation

Xian

Muthukrishnan

et al. 2019

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

408

249

View full text Add to dashboard Cite

Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge graphs for more accurate recommendation, we perform explicit reasoning with knowledge for decision making so that the recommendations are generated and supported by an interpretable causal inference procedure. To this end, we propose a method called Policy-Guided Path Reasoning (PGPR), which couples recommendation and interpretability by providing actual paths in a knowledge graph. Our contributions include four aspects. We first highlight the significance of incorporating knowledge graphs into recommendation to formally define and interpret the reasoning process. Second, we propose a reinforcement learning (RL) approach featuring an innovative soft reward strategy, user-conditional action pruning and a multi-hop scoring function. Third, we design a policy-guided graph search algorithm to efficiently and effectively sample reasoning paths for recommendation. Finally, we extensively evaluate our method on several large-scale real-world benchmark datasets, obtaining favorable results compared with state-of-the-art methods.

show abstract

Section: Reinforcement Learningmentioning

confidence: 99%

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation

Xian

Muthukrishnan

et al. 2019

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

408

249

View full text Add to dashboard Cite

show abstract

“…Firstly, techniques have been developed to estimate the performance of deploying a particular RL model prior to deployment. This helps in communicating risks and benefits of RL solutions with stakeholders and moves RL further into the realm of feasible technologies for high-impact application domains [200]. For single-step decision making problems, contextual bandit algorithms with theoretical bounds on decision-theoretic regret have become available.…”

Section: Den Hengst Et Al / Reinforcement Learning For Personalizatimentioning

confidence: 99%

Reinforcement learning for personalization: A systematic literature review

Hengst

Grua

Hassouni

et al. 2020

View full text Add to dashboard Cite

The major application areas of reinforcement learning (RL) have traditionally been game playing and continuous control. In recent years, however, RL has been increasingly applied in systems that interact with humans. RL can personalize digital systems to make them more relevant to individual users. Challenges in personalization settings may be different from challenges found in traditional application areas of RL. An overview of work that uses RL for personalization, however, is lacking. In this work, we introduce a framework of personalization settings and use it in a systematic literature review. Besides setting, we review solutions and evaluation strategies. Results show that RL has been increasingly applied to personalization problems and realistic evaluations have become more prevalent. RL has become sufficiently robust to apply in contexts that involve humans and the field as a whole is growing. However, it seems not to be maturing: the ratios of studies that include a comparison or a realistic evaluation are not showing upward trends and the vast majority of algorithms are used only once. This review can be used to find related work across domains, provides insights into the state of the field and identifies opportunities for future work.

show abstract

“…Nevertheless, a practical solution can be employed to benefit from the results in this study. The solution named, "off-policy evaluation framework" [51], keeps track of the best performing policy. As we are evaluating the policy that achieves maximum average data rates among the learned policies, we can benefit from the results in this study by designing the algorithm such that the off-policy evaluation framework is performed.…”

Section: ) Resultsmentioning

confidence: 99%

Handover Management for mmWave Networks With Proactive Performance Prediction Using Camera Images and Deep Reinforcement Learning

Koda

Nakashima

Yamamoto

et al. 2020

IEEE Trans. Cogn. Commun. Netw.

View full text Add to dashboard Cite

For millimeter-wave networks, this paper presents a paradigm shift for leveraging time-consecutive camera images in handover decision problems. While making handover decisions, it is important to predict future long-term performance-e.g., the cumulative sum of time-varying data rates-proactively to avoid making myopic decisions. However, this study experimentally notices that a time-variation in the received powers is not necessarily informative for proactively predicting the rapid degradation of data rates caused by moving obstacles. To overcome this challenge, this study proposes a proactive framework wherein handover timings are optimized while obstacle-caused data rate degradations are predicted before the degradations occur. The key idea is to expand a state space to involve timeconsecutive camera images, which comprises informative features for predicting such data rate degradations. To overcome the difficulty in handling the large dimensionality of the expanded state space, we use a deep reinforcement learning for deciding the handover timings. The evaluations performed based on the experimentally obtained camera images and received powers demonstrate that the expanded state space facilitates (i) the prediction of obstacle-caused data rate degradations from 500 ms before the degradations occur and (ii) superior performance to a handover framework without the state space expansion.

show abstract

Ad Recommendation Systems for Life-Time Value Optimization

Cited by 82 publications

References 13 publications

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation

Reinforcement learning for personalization: A systematic literature review

Handover Management for mmWave Networks With Proactive Performance Prediction Using Camera Images and Deep Reinforcement Learning

Contact Info

Product

Resources

About