2015
DOI: 10.1007/s10458-015-9283-7
|View full text |Cite
|
Sign up to set email alerts
|

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
89
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 68 publications
(89 citation statements)
references
References 14 publications
0
89
0
Order By: Relevance
“…While the work mentioned above interprets human feedback as a numeric reward, Loftin et al [32] interpret human feedback as categorical feedback strategies that depend both on the behavior the trainer is trying to teach and the trainer's teaching strategy. Then they proposed an algorithm to infer knowledge about the desired behavior from cases where no feedback is provided.…”
Section: Reinforcement Learning From Human Rewardmentioning
confidence: 99%
“…While the work mentioned above interprets human feedback as a numeric reward, Loftin et al [32] interpret human feedback as categorical feedback strategies that depend both on the behavior the trainer is trying to teach and the trainer's teaching strategy. Then they proposed an algorithm to infer knowledge about the desired behavior from cases where no feedback is provided.…”
Section: Reinforcement Learning From Human Rewardmentioning
confidence: 99%
“…While the work mentioned above interprets human feedback as a numeric reward, Loftin et al [30] interpret human feedback as categorical feedback strategies that depend both on the behavior the trainer is trying to teach and the trainer's teaching strategy. They infer knowledge about the desired behavior from cases where no feedback is provided and show that their algorithms could learn faster than algorithms that treat the feedback as a numeric reward.…”
Section: Learning From Human Rewardmentioning
confidence: 99%
“…Including human in the loop has been recognized as an effective way to accelerate on-line policy learning (Thomaz and Breazeal, 2006;Khan et al, 2011;Cakmak and Lopes, 2012;Loftin et al, 2016). Most previous approaches employ teaching signals at the end of dialogues, either the whole human-to-human dialogue history or a single reward to evaluate the human-machine dialogue performance (Su et al, 2016;Ferreira and Lefèvre, 2015).…”
Section: Companion Teaching For On-line Dialogue Policy Learningmentioning
confidence: 99%