2021
DOI: 10.48550/arxiv.2112.11701
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Abstract: An AI agent should be able to coordinate with humans to solve tasks. We consider the problem of training a Reinforcement Learning (RL) agent without using any human data, i.e., in a zero-shot setting, to make it capable of collaborating with humans. Standard RL agents learn through self-play. Unfortunately, these agents only know how to collaborate with themselves and normally do not perform well with unseen partners, such as humans. The methodology of how to train a robust agent in a zero-shot fashion is stil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
37
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(38 citation statements)
references
References 24 publications
1
37
0
Order By: Relevance
“…Training the agents with diverse partners is effective to alleviate over-fitting to specific partners and improve ZSC performance. Population-based training (PBT) methods [26,46,56] have achieved state-of-the-art performance in ZSC. In Fig.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Training the agents with diverse partners is effective to alleviate over-fitting to specific partners and improve ZSC performance. Population-based training (PBT) methods [26,46,56] have achieved state-of-the-art performance in ZSC. In Fig.…”
Section: Related Workmentioning
confidence: 99%
“…FCP [46] trains a diverse population by setting different random seeds and including partners of level of cooperation skills and architectures. TrajeDi [26] and MEP [56] adopt explicit diversity objective to generate diverse policies as partners and achieve state-of-the-art ZSC performance. Our PECAN also maintains a population of policies.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations