2014
DOI: 10.1007/978-3-319-11662-4_3
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Preference-Based Online Learning with Bandit Algorithms

Abstract: Abstract. In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available-instead, only weaker information is provided, in particular relativ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
42
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(44 citation statements)
references
References 55 publications
1
42
0
1
Order By: Relevance
“…Hence, this work extends the literature by evaluating how reinforcement learning, in our case bandit learning, can be used to personalize the human's HRI experience. We therefore draw from research that extended multi-armed bandit learning scenario to a dueling bandit learning scenario [15]. In those scenarios the agent learns the user's preference by presenting the user two items.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Hence, this work extends the literature by evaluating how reinforcement learning, in our case bandit learning, can be used to personalize the human's HRI experience. We therefore draw from research that extended multi-armed bandit learning scenario to a dueling bandit learning scenario [15]. In those scenarios the agent learns the user's preference by presenting the user two items.…”
Section: Related Workmentioning
confidence: 99%
“…Particularly, we study a special kind of bandit learning (i.e. dueling bandit learning [15]) for PL. In contrast to standard bandit learning techniques, this approach does not require a numerical reward function.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore we use RUCB in the following. For a general overview of PB-MAB algorithms, we refer the reader to [4].…”
Section: Preference-based Banditsmentioning
confidence: 99%
“…• How to perform online LTR with a finite population of candidate rankers, framing it as a K-armed bandits problem [6].…”
Section: Part IImentioning
confidence: 99%
“…• Online learning algorithms that have not been used in the LTR settings with the goal of inspiring researchers to adapt those algorithms for use in the IR community [4,5] [10 minutes] Datasets and resources.…”
Section: Part IImentioning
confidence: 99%