2023 IEEE International Conference on Robotics and Automation (ICRA) 2023
DOI: 10.1109/icra48891.2023.10160557
|View full text |Cite
|
Sign up to set email alerts
|

Explainable Action Advising for Multi-Agent Reinforcement Learning

Abstract: The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward si… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 35 publications
(28 reference statements)
0
2
0
Order By: Relevance
“…Advising mechanism shares policy-level knowledge, where less experienced agents can take good actions without making decisions themselves. Unfortunately, many methods based on advising mechanism assume the teacher has a well-trained policy (Ilhan, Gow, and Perez Liebana 2021;Anand et al 2021;Guo et al 2023), or have a centralized information structure (Omidshafiei et al 2019;Kim et al 2020;Gupta et al 2021), or increase training overhead (Ilhan, Gow, and Perez Liebana 2021), or are limited to two agents (Omidshafiei et al 2019;Kim et al 2020). Two works that are similar to ours are AdHocTD (da Silva, Glatt, and Costa 2017) and PSAF (Zhu et al 2021), but both are based on tabular Q-learning and lack robustness to suboptimal advice.…”
Section: Knowledge Sharingmentioning
confidence: 99%
“…Advising mechanism shares policy-level knowledge, where less experienced agents can take good actions without making decisions themselves. Unfortunately, many methods based on advising mechanism assume the teacher has a well-trained policy (Ilhan, Gow, and Perez Liebana 2021;Anand et al 2021;Guo et al 2023), or have a centralized information structure (Omidshafiei et al 2019;Kim et al 2020;Gupta et al 2021), or increase training overhead (Ilhan, Gow, and Perez Liebana 2021), or are limited to two agents (Omidshafiei et al 2019;Kim et al 2020). Two works that are similar to ours are AdHocTD (da Silva, Glatt, and Costa 2017) and PSAF (Zhu et al 2021), but both are based on tabular Q-learning and lack robustness to suboptimal advice.…”
Section: Knowledge Sharingmentioning
confidence: 99%
“…Performance estimates have been achieved so far using black-box inferences of representations following LLMs performance, requiring interpretation of specific beliefs a priori using some function-approximation mechanism (e.g. Oguntola et al, 2023). Likewise, allowing greater efficiency may reduce the expense of continually needing to scale models.…”
Section: Considerations For Artificial Agentsmentioning
confidence: 99%