2018
DOI: 10.1007/978-3-319-97304-3_7
|View full text |Cite
|
Sign up to set email alerts
|

Adaptively Shaping Reinforcement Learning Agents via Human Reward

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 8 publications
0
8
0
Order By: Relevance
“…Yu et al. [16] proposed an adaptive plasticity (shaping) algorithm that can combine different human–agent RL methods and dynamically select the most favourable method during the learning process. There is also a large body of work using expert advice or demonstrations of how to form rewards in RL problems [29, 30].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Yu et al. [16] proposed an adaptive plasticity (shaping) algorithm that can combine different human–agent RL methods and dynamically select the most favourable method during the learning process. There is also a large body of work using expert advice or demonstrations of how to form rewards in RL problems [29, 30].…”
Section: Related Workmentioning
confidence: 99%
“…As a result, a large body of work has proposed human knowledge‐based RL algorithms [10–15]. These human–agent RL algorithms convert manual instructions into a form that RL agents can recognize in order to embed human knowledge into the learning process [16]. Human–agent reinforcement learning methods have distinct advantages for different reinforcement learning tasks or at different learning stages [10].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The process includedi) Data collection using profile points and analysis. Data logs were created and extracted for a very short intervals of 5 minutes ii) Data extraction and analysis of the of workload demand patterns over a long period of timeduring last one year iii) Generate synthetic workloads patterns iv) Execute stress tests in test environment with large number of virtual users using systemapplications as in real world scenario v) Validate the results by extracting data from different profile points of the application threads and nodes on completion of the tests vi) Training the model using semi-supervised learning approach (deep learning paradigm) [7], [11]. vi) Forecast the likelihood of the traffic burst (excessive CPU usages) using the trained model [4], [6].…”
Section: Experiments For Validationmentioning
confidence: 99%
“…Consequently, the integration of the humans (i.e., doctors) into the learning process, and the interaction of an expert's knowledge with the automatic learning data would greatly enhance the knowledge discovery process [340]. While there is some previous work from other domains, particularly in training of robots [341], [342], human-in-theloop interactive RL is not yet well established in the healthcare domain. It remains open for future research to transfer the insights from existing studies into the healthcare domain to ensure successful applications of existing RL methods.…”
Section: B Integration Of Prior Knowledgementioning
confidence: 99%