2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2022
DOI: 10.1109/smc53654.2022.9945333
|View full text |Cite
|
Sign up to set email alerts
|

Advances in Preference-based Reinforcement Learning: A Review

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…An alternative to mitigate the sparsity of reward feedback is preference-based reinforcement learning (PbRL) [8], [9]. In PbRL, instead of directly receiving the instant reward information on each encountered state-action pair, the agent only obtains 1-bit preference feedback for each state-action pair or trajectory from a human overseer [10], [11].…”
mentioning
confidence: 99%
“…An alternative to mitigate the sparsity of reward feedback is preference-based reinforcement learning (PbRL) [8], [9]. In PbRL, instead of directly receiving the instant reward information on each encountered state-action pair, the agent only obtains 1-bit preference feedback for each state-action pair or trajectory from a human overseer [10], [11].…”
mentioning
confidence: 99%
“…On the other hand, in the literature of ranking, most of the theoretical work focuses on the tabular case where the rewards for different actions are uncorrelated (Feige et al, 1994;Shah et al, 2015;Shah and Wainwright, 2017;Heckel et al, 2018;Mao et al, 2018;Jang et al, 2017;Chen et al, 2013;Chen and Suh, 2015;Rajkumar and Agarwal, 2014;Negahban et al, 2018;Hajek et al, 2014;Heckel et al, 2019). And a majority of the empirical literature focuses on the framework of learning to rank (MLE) under general function approximation, especially when the reward is parameterized by a neural network (Liu et al, 2009;Xia et al, 2008;Cao et al, 2007;Christiano et al, 2017a;Ouyang et al, 2022;Brown et al, 2019;Shin et al, 2023;Busa-Fekete et al, 2014;Wirth et al, 2016Wirth et al, , 2017Christiano et al, 2017b;Abdelkareem et al, 2022). Similar idea of RL with AI feedback also learns a reward model from preference Bai et al (2022b), except for that the preference is labeled by another AI model instead of human.…”
Section: Related Workmentioning
confidence: 99%