2021
DOI: 10.48550/arxiv.2111.04850
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dueling RL: Reinforcement Learning with Trajectory Preferences

Abstract: We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them. The success of the traditional RL framework crucially relies on the underlying agent-reward model, which, however, depends on how accurately a system designer can express an appropriate reward function and often a non-trivial task. The main novelty of our fram… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 14 publications
1
7
0
Order By: Relevance
“…Then the α-Eluder dimension of F T is at most O(dr 2 log(rLS h/α)). Therefore, our results subsume the setting of logistic preference functions (Pacchiano et al, 2021) as a spacial case.…”
Section: General Function Approximationmentioning
confidence: 69%
See 4 more Smart Citations
“…Then the α-Eluder dimension of F T is at most O(dr 2 log(rLS h/α)). Therefore, our results subsume the setting of logistic preference functions (Pacchiano et al, 2021) as a spacial case.…”
Section: General Function Approximationmentioning
confidence: 69%
“…The function spaces are general sets of functions, which may be either finitely parameterized or nonparametric. This setting is more general than the previous theoretical results for PbRL (Novoseller et al, 2020;Xu et al, 2020b;Pacchiano et al, 2021). Our contributions are summarized as follows:…”
Section: Introductionmentioning
confidence: 85%
See 3 more Smart Citations