2017
DOI: 10.1609/aaai.v31i1.11115
|View full text |Cite
|
Sign up to set email alerts
|

A Finite Memory Automaton for Two-Armed Bernoulli Bandit Problems

Abstract: Existing approaches to the multi-armed bandit (MAB) primarily rely on perfect recall of past actions to generate estimates for arm payoff probabilities; it is further assumed that the decision maker knows whether arm payoff probabilities can change. To capture the computational limitations many decision making systems face, we explore performance under bounded resources in the form of imperfect recall of past information. We present a finite memory automaton (FMA) designed to solve static and dynamic MAB probl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 4 publications
0
1
0
Order By: Relevance
“…Finite state machines have been previously used in game theory to model players with bounded rationality in iterative strategic games [6,7,8,9,10], to model the evolution of rational players [11], to analyze two-armed Bernoulli bandit problems [12], to model AI agents behavior in video games [13], and to specify AI agents for border patrol [14]. Kanovich, Kirgin, Nigam, and Scedrov proved NPcompleteness of a security problem in collaborative systems with bounded-recall agents [15].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Finite state machines have been previously used in game theory to model players with bounded rationality in iterative strategic games [6,7,8,9,10], to model the evolution of rational players [11], to analyze two-armed Bernoulli bandit problems [12], to model AI agents behavior in video games [13], and to specify AI agents for border patrol [14]. Kanovich, Kirgin, Nigam, and Scedrov proved NPcompleteness of a security problem in collaborative systems with bounded-recall agents [15].…”
Section: Literature Reviewmentioning
confidence: 99%