Ramtin Keramati scite author profile

Ramtin Keramati

5Publications

61Citation Statements Received

111Citation Statements Given

How they've been cited

How they cite others

108

Affiliations

Stanford University

Publications

Order By: Most citations

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Keramati

Dann

Tamkin

et al. 2020

AAAI

View full text Add to dashboard Cite

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CVaR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces.

show abstract

Significant contribution of small icebergs to the freshwater budget in Greenland fjords

Rezvanbehbahani

Stearns

Keramati

et al. 2020

Commun Earth Environ

View full text Add to dashboard Cite

Icebergs represent nearly half of the mass loss from the Greenland Ice Sheet and provide a distributed source of freshwater along fjords which can alter fjord circulation, nutrient levels, and ultimately the Meridional Overturning Circulation. Here we present analyses of high resolution optical satellite imagery using convolutional neural networks to accurately delineate iceberg edges in two East Greenland fjords. We find that a significant portion of icebergs in fjords are comprised of small icebergs that were not detected in previously-available coarser resolution satellite images. We show that the preponderance of small icebergs results in high freshwater delivery, as well as a short life span of icebergs in fjords. We conclude that an inability to identify small icebergs leads to inaccurate frequency-size distribution of icebergs in Greenland fjords, an underestimation of iceberg area (specifically for small icebergs), and an overestimation of iceberg life span.

show abstract

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Keramati

Brunskill

2019

View full text Add to dashboard Cite

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines. CCS CONCEPTS• Human-centered computing → Human computer interaction (HCI).

show abstract

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Keramati¹,

Dann²,

Tamkin³

et al. 2019

Preprint

View full text Add to dashboard Cite

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CVaRoptimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces.

show abstract

Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

Keramati¹,

Gottesman²,

Celi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy. However, a new decision policy may be better than a baseline policy for some individuals but not others. This has motivated a push towards personalization and accurate per-state estimates of heterogeneous treatment effects (HTEs). Given the limited data present in many important applications, individual predictions can come at a cost to accuracy and confidence in such predictions. We develop a method to balance the need for personalization with confident predictions by identifying subgroups where it is possible to confidently estimate the expected difference in a new decision policy relative to a baseline. We propose a novel loss function that accounts for uncertainty during the subgroup partitioning phase. In experiments, we show that our method can be used to form accurate predictions of HTEs where other methods struggle.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ramtin Keramati

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Significant contribution of small icebergs to the freshwater budget in Greenland fjords

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

Contact Info

Product

Resources

About