2019
DOI: 10.1561/2200000068
|View full text |Cite
|
Sign up to set email alerts
|

Introduction to Multi-Armed Bandits

Abstract: Also available as a combined paper and online subscription.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
115
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 393 publications
(116 citation statements)
references
References 27 publications
0
115
0
1
Order By: Relevance
“…In fact, in many circumstances, it seems rather prudent to assume that information about outcome values and probabilities are shaped by past encounters of the same decision problem. Experimentally, this configuration is often translated into multi-armed bandit problems (starting with Thompson [59], but see [60] for a review), where the decisionmaker faces abstract cues of unknown value and has to figure by trial-and-error the value of the options. Computationally, behaviour in multi-armed bandit problems is generally wellcaptured by associative or reinforcement learning processes Wu & Gonzalez [54] De Martino et al [25] Pessiglione et al [56] Fiorillo et al [53] Platt & Glimcher [52] Figure 2.…”
Section: The Experience-description Gapmentioning
confidence: 99%
“…In fact, in many circumstances, it seems rather prudent to assume that information about outcome values and probabilities are shaped by past encounters of the same decision problem. Experimentally, this configuration is often translated into multi-armed bandit problems (starting with Thompson [59], but see [60] for a review), where the decisionmaker faces abstract cues of unknown value and has to figure by trial-and-error the value of the options. Computationally, behaviour in multi-armed bandit problems is generally wellcaptured by associative or reinforcement learning processes Wu & Gonzalez [54] De Martino et al [25] Pessiglione et al [56] Fiorillo et al [53] Platt & Glimcher [52] Figure 2.…”
Section: The Experience-description Gapmentioning
confidence: 99%
“…The MAB problem is a purely online ML, in which the player strives to gain the maximum reward from multiple arms of slot machines [27,39]. Precisely, the MAB problem aims to detect and select, through finite trials, the arm that maximizes the long-term reward.…”
Section: General Single Player Mab Strategymentioning
confidence: 99%
“…Multi-Armed Bandit (MAB) is a powerful framework that allows agents to solve sequential decision making problems under uncertainty [ 16 ]. In the standard version, an algorithm has K possible actions (or arms) to choose from and T rounds (or time-steps).…”
Section: Introductionmentioning
confidence: 99%