2017
DOI: 10.1609/aaai.v31i1.10887
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems

Abstract: This paper is about the estimation of the maximum expected value of an infinite set of random variables.This estimation problem is relevant in many fields, like the Reinforcement Learning (RL) one.In RL it is well known that, in some stochastic environments, a bias in the estimation error can increase step-by-step the approximation error leading to large overestimates of the true action values. Recently, some approaches have been proposed to reduce such bias in order to get better action-value estimates, but a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…Notice that both Seq(GP-UCB-SW) and Seq(GP-UCB-CD) provide an unbiased estimate of the maximum (or maximum and minimum) contaminant concentrations and temporal location. Such estimates are obtained for each monitoring day through a Monte Carlo approach drawing 100 GP realizations to estimate the probability that each time instant corresponds to the maximum (or minimum) contaminant concentration and using those probabilities to perform a weighted average over the concentrations used to train the GP, similar to what was proposed by D'Eramo et al 40…”
Section: Methodsmentioning
confidence: 99%
“…Notice that both Seq(GP-UCB-SW) and Seq(GP-UCB-CD) provide an unbiased estimate of the maximum (or maximum and minimum) contaminant concentrations and temporal location. Such estimates are obtained for each monitoring day through a Monte Carlo approach drawing 100 GP realizations to estimate the probability that each time instant corresponds to the maximum (or minimum) contaminant concentration and using those probabilities to perform a weighted average over the concentrations used to train the GP, similar to what was proposed by D'Eramo et al 40…”
Section: Methodsmentioning
confidence: 99%
“…Since underestimation bias is not preferable (Hasselt, 2010;Lan et al, 2020), Weighted Q-learning proposes (D'Eramo et al, 2016;Zhang et al, 2017) the weighted estimator for the maximal action value based on a weighted average of estimated actions values. However, the weights computation is only practical in a tabular setting (D'Eramo et al, 2017). Our work differs from the foregoing in that it proposes a new estimator which could be generalized into the deep Q-learning network setting.…”
Section: Related Workmentioning
confidence: 98%
“…To this end, several variants of Q-learning have been developed to handle these challenges. Estimation bias is considered in [10], [11], and the estimation variance and training stability are examined in [12], [13]. The convergence rate is improved in [14], and Talha Bozkus and Urbashi Mitra are with the Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, USA.…”
Section: Introductionmentioning
confidence: 99%