2008
DOI: 10.1016/j.amc.2007.07.043
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
33
0
1

Year Published

2010
2010
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 68 publications
(34 citation statements)
references
References 13 publications
0
33
0
1
Order By: Relevance
“…While there is no such currently established framework for social learning research, multi-armed bandits have been widely deployed to study learning across biology, economics, artificial intelligence research and computer science (e.g. 18, 24, 25–28) because they mimic a common problem faced by individuals that must make decisions about how to allocate their time in order to maximize their payoffs. Multi-armed bandits capture the essence of many difficult problems in the real world, for instance, where there are many possible actions, only a few of which yield a high payoff, where it is possible to learn asocially or through observation of others, where copying error occurs and where the environment changes.…”
Section: The Tournamentmentioning
confidence: 99%
“…While there is no such currently established framework for social learning research, multi-armed bandits have been widely deployed to study learning across biology, economics, artificial intelligence research and computer science (e.g. 18, 24, 25–28) because they mimic a common problem faced by individuals that must make decisions about how to allocate their time in order to maximize their payoffs. Multi-armed bandits capture the essence of many difficult problems in the real world, for instance, where there are many possible actions, only a few of which yield a high payoff, where it is possible to learn asocially or through observation of others, where copying error occurs and where the environment changes.…”
Section: The Tournamentmentioning
confidence: 99%
“…The restless bandit problem is known to be PSPACE-complete, meaning it is difficult to compute optimal solutions for in practice [30,13]. Multi-armed bandit problems have previously been used to study the tradeoff between exploitation and exploration in learning environments [36,24].…”
Section: Related Computational Techniquesmentioning
confidence: 99%
“…In its simplest form, reinforcement-learning analyses often use the multi-armed (or "n-armed") bandit task to evaluate various methods of distributing exploration and exploitation (e.g., Dimitrakakis & Lagoudakis, 2008;Sikora, 2008). This task provides an excellent platform to explore choice in stationary (with unchanging payoffs) and nonstationary (with changing payoffs) environments, and it has also been applied to the domains of human learning and cognition (e.g., Burns, Lee, & Vickers 2006;Plowright & Shettleworth, 1990), economics (e.g., Banks, Olson, & Porter 1997), marketing and management (e.g., Azoulay-Schwartz, Kraus, & Wilkenfeld 2004;Valsecchi, 2003), and math and computer science (e.g., Auer, Cesa-Bianchi, Freund, & Schapire 1995;Koulouriotis & Xanthopoulos, 2008).…”
mentioning
confidence: 99%
“…We took an approach that was inspired by the study of reinforcement-learning algorithms as applied to machine learning (Koulouriotis & Xanthopoulos, 2008;Sutton & Barto, 1998). In its simplest form, reinforcement-learning analyses often use the multi-armed (or "n-armed") bandit task to evaluate various methods of distributing exploration and exploitation (e.g., Dimitrakakis & Lagoudakis, 2008;Sikora, 2008).…”
mentioning
confidence: 99%