A Survey of Preference-Based Online Learning with Bandit Algorithms

Busa-Fekete, Róbert; Hüllermeier, Eyke

doi:10.1007/978-3-319-11662-4_3

Cited by 37 publications

(44 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Hence, this work extends the literature by evaluating how reinforcement learning, in our case bandit learning, can be used to personalize the human's HRI experience. We therefore draw from research that extended multi-armed bandit learning scenario to a dueling bandit learning scenario [15]. In those scenarios the agent learns the user's preference by presenting the user two items.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Exploring embodiment and dueling bandit learning for preference adaptation in human-robot interaction

Schneider

Kummert

2017

2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)

View full text Add to dashboard Cite

Adaptation for social companions is a crucial requirement for future applications. Personalized interaction seems to be an important factor for long-term commitment to interact with a social robot. We present a study evaluating the feasibility of a dueling bandit learning approach for preference learning (PL) in Human-Robot Interaction (HRI). Furthermore, we explore whether the embodiment of the PL agent has an influence on the user's evaluation of the learner. We conducted a study (n=53) comparing a graphical user interface (GUI), a virtual robot and a real robot. We found no difference regarding the preference for the virtual or real robot. We used the obtained study data to compare the PL approach against a strategy that randomly selects preference rankings. The results show that that the dueling bandit PL approach can be used to learn a user's preference in HRI.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Particularly, we study a special kind of bandit learning (i.e. dueling bandit learning [15]) for PL. In contrast to standard bandit learning techniques, this approach does not require a numerical reward function.…”

Section: Introductionmentioning

confidence: 99%

Exploring embodiment and dueling bandit learning for preference adaptation in human-robot interaction

Schneider

Kummert

2017

2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)

View full text Add to dashboard Cite

show abstract

“…Therefore we use RUCB in the following. For a general overview of PB-MAB algorithms, we refer the reader to [4].…”

Section: Preference-based Banditsmentioning

confidence: 99%

Preference-Based Monte Carlo Tree Search

Joppen

Wirth

Fürnkranz

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define. Real-time MCTS is a variant which may only rarely encounter states with an explicit, extrinsic reward.To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent. Recent work has shown evidence that in different areas the underlying structure is ordinal and not numerical. Hence erroneous and biased heuristics are inevitable, especially in such domains. In this paper, we propose a MCTS variant which only depends on qualitative feedback, and therefore opens up new applications for MCTS. We also find indications that translating absolute into ordinal feedback may be beneficial. Using a puzzle domain, we show that our preference-based MCTS variant, wich only receives qualitative feedback, is able to reach a performance level comparable to a regular MCTS baseline, which obtains quantitative feedback.

show abstract

“…• How to perform online LTR with a finite population of candidate rankers, framing it as a K-armed bandits problem [6].…”

Section: Part IImentioning

confidence: 99%

“…• Online learning algorithms that have not been used in the LTR settings with the goal of inspiring researchers to adapt those algorithms for use in the IR community [4,5] [10 minutes] Datasets and resources.…”

Section: Part IImentioning

confidence: 99%

Online Learning to Rank for Information Retrieval

Grotov

Rijke

2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

During the past 10-15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front.Below we describe why we believe that the time is right for an intermediate-level tutorial on online learning to rank, the objectives of the proposed tutorial, its relevance, as well as more practical details, such as format, schedule and support materials.

show abstract

A Survey of Preference-Based Online Learning with Bandit Algorithms

Cited by 37 publications

References 55 publications

Exploring embodiment and dueling bandit learning for preference adaptation in human-robot interaction

Exploring embodiment and dueling bandit learning for preference adaptation in human-robot interaction

Preference-Based Monte Carlo Tree Search

Online Learning to Rank for Information Retrieval

Contact Info

Product

Resources

About