2021
DOI: 10.48550/arxiv.2103.04047
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reinforcement Learning, Bit by Bit

Abstract: Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We develop concepts and establish a regret bound that together offer principled guidance. The bound sheds light on questions of what information to seek, how to seek that information, and what information to reta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
39
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 11 publications
(40 citation statements)
references
References 50 publications
1
39
0
Order By: Relevance
“…Note that this theorem is not the main result of the paper-it serves as a foundation based on which we develop our analysis and a baseline against which we compare our main result. Similar information-theoretic bounds have been established in previous studies, a closely related one being (Lu et al, 2021a).…”
Section: Regret Boundsupporting
confidence: 79%
See 1 more Smart Citation
“…Note that this theorem is not the main result of the paper-it serves as a foundation based on which we develop our analysis and a baseline against which we compare our main result. Similar information-theoretic bounds have been established in previous studies, a closely related one being (Lu et al, 2021a).…”
Section: Regret Boundsupporting
confidence: 79%
“…One way of bounding Bayesian regret is through first bounding an agent's information ratio, which is a statistic that quantifies how the agent trades off between regret and information. Various versions of the information ratio have been proposed and studied over the past decade (Russo and Van Roy, 2014a;Bubeck et al, 2015;Russo and Van Roy, 2016;Bubeck and Eldan, 2016;Russo and Van Roy, 2018a;Dong and Van Roy, 2018;Russo and Van Roy, 2018b;Nikolov et al, 2018;Zimmert and Lattimore, 2019;Lattimore and Szepesvári, 2019;Lu and Van Roy, 2019;Bubeck and Sellke, 2020;Lattimore and Szepesvári, 2020;Lattimore and György, 2020;Kirschner et al, 2020;Lu et al, 2021a;Lattimore and Hao, 2021;Devraj et al, 2021). Each depends on beliefs about the environment, as expressed by a prior distribution and likelihood function.…”
Section: Introductionmentioning
confidence: 99%
“…Information Ratios. Of all the related works described in this section, the work [33] on IDS for MABs discussed above and the recent work [28] developing an information ratio for general problems are most relevant to our current paper. In [28], the authors develop a powerful, abstract framework for reasoning about the design of algorithms for solving sequential decision-making problems, including MDPs.…”
Section: Related Workmentioning
confidence: 99%
“…Prior works in multi-armed bandits and RL [33,28] seek to balance the goals of exploration and exploitation by minimizing the ratio of cost incurred -formulated as regret -to information acquired when determining which action or sequence of actions to choose. See Section 3 for an overview of this line of work.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation