Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval 2017
DOI: 10.1145/3121050.3121098
|View full text |Cite
|
Sign up to set email alerts
|

Towards Learning Reward Functions from User Interactions

Abstract: In the physical world, people have dynamic preferences, e.g., the same situation can lead to satisfaction for some humans and to frustration for others. Personalization is called for. The same observation holds for online behavior with interactive systems. It is natural to represent the behavior of users who are engaging with interactive systems such as a search engine or a recommender system, as a sequence of actions where each next action depends on the current situation and the user reward of taking a parti… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…Less frequently mentioned methods include the use of interaction logs (Z. Li, Kiseleva, De Rijke, & Grotov, 2017) and autonomous data collection with network-shared knowledge (Ahrndt, Lützenberger, & Prochnow, 2016).…”
Section: System Training Based On Human Empirical Datamentioning
confidence: 99%
“…Less frequently mentioned methods include the use of interaction logs (Z. Li, Kiseleva, De Rijke, & Grotov, 2017) and autonomous data collection with network-shared knowledge (Ahrndt, Lützenberger, & Prochnow, 2016).…”
Section: System Training Based On Human Empirical Datamentioning
confidence: 99%
“…All in all, it can be particularly challenging to translate highlevel complex RS goals to an individual numeric reward signal [9,11], and there is much work to be explored in the field of reward engineering for RS. Possible topics are: designing/modeling goals [8,15], reward hacking [2,16,23], incorporating artificial curiosity/intrinsic motivation [4,17,20,21] (e.g., by explicitly rewarding the RS for learning new information about users) and inverse RL [14].…”
Section: Rs Environment Main Componentsmentioning
confidence: 99%
“…Currently, optimizing interactive systems relies on explicit assumptions about users' objectives in terms of their needs and frustrations (Li et al, 2017b). Commonly, an objective function is manually designed for a particular task to reflect the quality of an interactive system, e.g., in terms of user satisfaction (Kelly, 2009(Kelly, , 2015, user effort (Yilmaz et al, 2014) or other domain-specific metrics, such as relevance judgements in information retrieval (Järvelin and Kekäläinen, 2002;Saracevic, 1975;Saracevic et al, 1988;Drutsa et al, 2015;Dupret and Lalmas, 2013), user feedbacks (e.g.…”
Section: Introductionmentioning
confidence: 99%