2023
DOI: 10.1609/aaai.v37i8.26119
|View full text |Cite
|
Sign up to set email alerts
|

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration & Planning

Abstract: We study the problem of episodic reinforcement learning in continuous state-action spaces with unknown rewards and transitions. Specifically, we consider the setting where the rewards and transitions are modeled using parametric bilinear exponential families. We propose an algorithm, that a) uses penalized maximum likelihood estimators to learn the unknown parameters, b) injects a calibrated Gaussian noise in the parameter of rewards to ensure exploration, and c) leverages linearity of the bilinear exponential… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 13 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?