2014
DOI: 10.1007/978-3-319-12027-0_43
|View full text |Cite
|
Sign up to set email alerts
|

Using a Priori Information for Fast Learning Against Non-stationary Opponents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 7 publications
0
2
0
Order By: Relevance
“…Existing MAS literature outside of the scope of trust and reputation has looked at identifying the best behaviour policy to use from a set of policies, by identifying other agents’ behaviours and responding to that. Part of this task involves identifying when other agents have changed their own policy (Hernandez-Leal et al ., 2016). One reason we do not compare against policy change literature is in our context, agent behaviours are not a series of static policies that are swapped between, but rather are continuous values that can change gradually or suddenly.…”
Section: Related Workmentioning
confidence: 99%
“…Existing MAS literature outside of the scope of trust and reputation has looked at identifying the best behaviour policy to use from a set of policies, by identifying other agents’ behaviours and responding to that. Part of this task involves identifying when other agents have changed their own policy (Hernandez-Leal et al ., 2016). One reason we do not compare against policy change literature is in our context, agent behaviours are not a series of static policies that are swapped between, but rather are continuous values that can change gradually or suddenly.…”
Section: Related Workmentioning
confidence: 99%
“…They used a Markov Decision Process (MDP) algorithm as an opponent model learning mechanism, specifically the MDP-CL framework (Hernandez-Leal et al, 2014a). They, also, proposed an algorithm (2014b) that can use a priori information, in the form of a set of models, in order to promote faster detection of the opponent model; their algorithm also keeps a record of the models in case the opponent reuses his previous strategies.…”
Section: An Overview Of Background and Related Workmentioning
confidence: 99%