2020
DOI: 10.1016/j.eswa.2020.113650
|View full text |Cite
|
Sign up to set email alerts
|

Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 16 publications
0
12
0
Order By: Relevance
“…To improve investigation efficacy and patient satisfaction, clinics used to have different departments such as ENT (Ear, Nose, and Throat) and pediatrics, etc. Motivated by the real-world scenario and the promising results obtained by Liao et al [ 12 , 37 ], we also utilized a hierarchical policy learning method, where the higher-level policy (controller) activates one of the lower-level policies (departmental) depending on patients’ self-report and other symptoms and the department policy conducts group-specific symptom investigation.…”
Section: Methodsmentioning
confidence: 99%
“…To improve investigation efficacy and patient satisfaction, clinics used to have different departments such as ENT (Ear, Nose, and Throat) and pediatrics, etc. Motivated by the real-world scenario and the promising results obtained by Liao et al [ 12 , 37 ], we also utilized a hierarchical policy learning method, where the higher-level policy (controller) activates one of the lower-level policies (departmental) depending on patients’ self-report and other symptoms and the department policy conducts group-specific symptom investigation.…”
Section: Methodsmentioning
confidence: 99%
“…Reinforcement Learning (RL) approaches have tried to model a generation process ProKnow by rewarding the model with adherence to ground truth using general language understanding evaluations (GLUE) task metrics such as BLEU-n and ROUGE-L. However, they do not explicitly model clinically practiced ProKnow which enables explainable NLG that end-users and domain experts can trust (Wang et al, 2018 ; Zhang and Bansal, 2019 ; Saha et al, 2020 ). Hence, a method that effectively utilizes ProKnow will contribute to algorithmic explainability in the NLG process (Gaur et al, 2021 ; Sheth et al, 2021 ).…”
Section: Related Workmentioning
confidence: 99%
“…RL does not require any data to be given in advance, which obtains the reward by the continuous interaction between agent and environment. By employing the RL, a system dynamically adjusts the parameters to maximize the accumulated reward [ 51 , 52 ]. In RL, the return function is usually defined to represent the sum of the discounts of all rewards observed by the agent after a certain state, i.e., where, is the discount factor ( ), which represents the weight relationship between future rewards and immediate reward, and R is the immediate reward.…”
Section: Preliminariesmentioning
confidence: 99%