IEEE/WIC/ACM International Conference on Web Intelligence 2019
DOI: 10.1145/3350546.3352501
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning for Personalized Dialogue Management

Abstract: Language systems have been of great interest to the research community and have recently reached the mass market through various assistant platforms on the web. Reinforcement Learning methods that optimize dialogue policies have seen successes in past years and have recently been extended into methods that personalize the dialogue, e.g. take the personal context of users into account. These works, however, are limited to personalization to a single user with whom they require multiple interactions and do not g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…Learning such patterns for each environment individually may require a substantial number of trajectories and may be infeasible in some settings, such as those where users cannot be identified across trajectories or those where each user is expected to contribute only one trajectory to D i . An alternative approach is to define is a single agent and MDP with user-specific information in the state space S and learn a single π * for all users [47]. In some settings, users can be described using a function that returns a vector representation of the l features that characterize a user φ : U → φ 1 (U ), .…”
Section: Den Hengst Et Al / Reinforcement Learning For Personalizatimentioning
confidence: 99%
See 1 more Smart Citation
“…Learning such patterns for each environment individually may require a substantial number of trajectories and may be infeasible in some settings, such as those where users cannot be identified across trajectories or those where each user is expected to contribute only one trajectory to D i . An alternative approach is to define is a single agent and MDP with user-specific information in the state space S and learn a single π * for all users [47]. In some settings, users can be described using a function that returns a vector representation of the l features that characterize a user φ : U → φ 1 (U ), .…”
Section: Den Hengst Et Al / Reinforcement Learning For Personalizatimentioning
confidence: 99%
“…Models can also be used to interact with the RL agent in simulation. For example, dialogue agent modules may be trained by interacting with a simulated chatbot user [47,95,105]. Secondly, upfront knowledge may be available in the form of data on human responses to system behavior.…”
Section: A Classification Of Personalization Settingsmentioning
confidence: 99%
“…4 For example, a title keyword search for 'personali' or 'personaliz' returns 124 articles from the ACL Anthology and a further 10 from the arXiv Computation and Language (cs.CL) subclass. These systems cover a wide range of tasks including dialogue [127,157,36,39,41,109,133,146,149,206,238,244], recipe or diet generation [147,87,159], summarisation [215,240], machine translation [156,153,194,237], QA [137,193], search and information retrieval [4,40,59,70,245], sentiment analysis [80,155,226], domain classification [129,114,113], entity resolution [132], and aggression or abuse detection [107,108]; and are applied to a number of societal domains such as education [118,163,241], medicine [3,15,225,235] and news consumption…”
Section: From Implicit To Explicit Personalisationmentioning
confidence: 99%
“…Preferences are defined in both personal and universal contexts, reflecting the persistent difficulties of separating the two. Ficler and Goldberg (2017) focus on modulating formality depending on context, while others focus on the personalisation of language models, such as reflecting author personality in machine translation Mirkin and Meunier, 2015;Rabinovich et al, 2017); providing financial recommendations via chat bots (Den Hengst et al, 2019); or enabling customised online shopping (Mo et al, 2016). Most studies target human preferences assumed to be commonly-held and stable, such as word order (Futrell and Levy, 2019), sense making (De Deyne et al, 2016;Seminck and Amsili, 2017) and vocabulary matching (Campano et al, 2014;Dubuisson Duplessis et al, 2017).…”
Section: Conceptual Classificationmentioning
confidence: 99%