2022
DOI: 10.48550/arxiv.2211.00241
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adversarial Policies Beat Superhuman Go AIs

Abstract: We attack the state-of-the-art Go-playing AI system, KataGo, by training an adversarial policy that plays against a frozen KataGo victim. Our attack achieves a >99% win-rate against KataGo without search, and a >50% win-rate when KataGo uses enough search to be near-superhuman. To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional. Notably, the adversary does not win by learning to play Go better than KataGo-in fact, the ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…One might hope that these factors will lead to very reliable performance. However, in [14], we show that these systems still harbor severe weaknesses and can make game-losing mistakes that much weaker humans would not. These mistakes can also be induced by humans without algorithmic assistance, and transfer in a zero-shot setting to different Go systems.…”
Section: Phase 1: Specific Applications and Domainsmentioning
confidence: 82%
“…One might hope that these factors will lead to very reliable performance. However, in [14], we show that these systems still harbor severe weaknesses and can make game-losing mistakes that much weaker humans would not. These mistakes can also be induced by humans without algorithmic assistance, and transfer in a zero-shot setting to different Go systems.…”
Section: Phase 1: Specific Applications and Domainsmentioning
confidence: 82%
“…Modern RL methods in multiplayer iterated settings are typically multi-agent [29,30,41]. While our experiments provides initial insights on adversarial collusion, recent studies have also shown that even state of the art multi-agent RL agents can be exploited by adversarial policies [42]. More work would be needed to assess the effectiveness of our attack against algorithms trained using multi-agent RL techniques.…”
Section: Discussionmentioning
confidence: 96%
“…AI-assisted technologies can encourage curiosity, questioning, systematic thinking, trial and error, reasoning, and elaboration (Güss et al 2021). A recently published preprint (Wang et al 2022) shares how curiosity might encourage humans to use AI's 'blind spots.' The underlying argument is that AI-enabled systems encourage workers to reach wide and deep into their knowledge bases, resulting in novel ideas (Althuizen and Reichel 2016).…”
Section: Ai As a General-purpose Tool For Innovative Behaviourmentioning
confidence: 99%