Jayden Ooi scite author profile

Jayden Ooi

5Publications

16Citation Statements Received

62Citation Statements Given

How they've been cited

How they cite others

Affiliations

Google (United States)

Publications

Order By: Most citations

Advantage Amplification in Slowly Evolving Latent-State Environments

Mladenov

Meshi

Ooi

et al. 2019

View full text Add to dashboard Cite

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle called advantage amplification that can overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task.

show abstract

BRPO: Batch Residual Policy Optimization

Sohn¹,

Chow²,

Ooi³

et al. 2020

Preprint

View full text Add to dashboard Cite

Advantage Amplification in Slowly Evolving Latent-State Environments

Mladenov¹,

Meshi²,

Ooi³

et al. 2019

Preprint

View full text Add to dashboard Cite

Towards Content Provider Aware Recommender Systems

Zhan

Christakopoulou

et al. 2021

View full text Add to dashboard Cite

Most existing recommender systems focus primarily on matching users (content consumers) to content which maximizes user satisfaction on the platform. It is increasingly obvious, however, that content providers have a critical influence on user satisfaction through content creation, largely determining the content pool available for recommendation. A natural question thus arises: can we design recommenders taking into account the long-term utility of both users and content providers? By doing so, we hope to sustain more content providers and a more diverse content pool for long-term user satisfaction. Understanding the full impact of recommendations on both user and content provider groups is challenging. This paper aims to serve as a research investigation of one approach toward building a content provider aware recommender, and evaluating its impact in a simulated setup.To characterize the user-recommender-provider interdependence, we complement user modeling by formalizing provider dynamics as well. The resulting joint dynamical system gives rise to a weaklycoupled partially observable Markov decision process driven by recommender actions and user feedback to providers. We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the content provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all content providers on the platform under some mild assumptions. To evaluate our approach, we introduce a simulation environment capturing the key interactions among users, providers, and the recommender. We offer a number of simulated experiments that shed light on both the benefits and the limitations of our approach. These results help understand how and when a content provider aware recommender agent is of benefit in building multi-stakeholder recommender systems.

show abstract

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Liu

Cheng

et al. 2020

Preprint

View full text Add to dashboard Cite

Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments. However, training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems because of the expensiveness of interaction and limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy that is adaptively selected from a pool of agents trained with an ensemble of hyper-parameters. We further extend ABPS to evolve hyper-parameters during training by hybridizing ABPS with an adapted version of Population Based Training (ABPS-PBT). We conduct experiments with multiple Atari games with up to 16 hyper-parameter/architecture setups. ABPS achieves superior overall performance, reduced variance on top 25% agents, and equivalent performance on the best agent compared to conventional hyperparameter tuning with independent training, even though ABPS only requires the same number of environmental interactions as training a single agent. We also show that ABPS-PBT further improves the convergence speed and reduces the variance.Recent years have witnessed the success of deep reinforcement learning (RL) in solving complex sequential decision making problems such as games [Mnih et al., 2013, Schulman et al., 2015, Mnih et al., 2016. However, it is yet proven to be effective in real world applications such as large-scale ˚work was done while an intern in Google

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jayden Ooi

Advantage Amplification in Slowly Evolving Latent-State Environments

BRPO: Batch Residual Policy Optimization

Advantage Amplification in Slowly Evolving Latent-State Environments

Towards Content Provider Aware Recommender Systems

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Contact Info

Product

Resources

About