2020
DOI: 10.1609/aaai.v34i04.5986
|View full text |Cite
|
Sign up to set email alerts
|

Achieving Fairness in the Stochastic Multi-Armed Bandit Problem

Abstract: We study an interesting variant of the stochastic multi-armed bandit problem, which we call the Fair-MAB problem, where, in addition to the objective of maximizing the sum of expected rewards, the algorithm also needs to ensure that at any time, each arm is pulled at least a pre-specified fraction of times. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, which we call r-Regret, that take… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
35
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 47 publications
(35 citation statements)
references
References 12 publications
0
35
0
Order By: Relevance
“…Zhang et al [93] investigated specifically the dynamics of group qualification rates [61,79] under the more general partially-observed MDP setting. Moreover, there also exist some studies of fairness-aware (contextual) multi-armed bandits [5,28,29,44,73] -a simplified form of reinforcement learning -where the fairness constraint is defined as a minimum rate at which a task/resource is assigned to a user [12,13,48,62,85].…”
Section: Dynamic Recommendation Fairnessmentioning
confidence: 99%
“…Zhang et al [93] investigated specifically the dynamics of group qualification rates [61,79] under the more general partially-observed MDP setting. Moreover, there also exist some studies of fairness-aware (contextual) multi-armed bandits [5,28,29,44,73] -a simplified form of reinforcement learning -where the fairness constraint is defined as a minimum rate at which a task/resource is assigned to a user [12,13,48,62,85].…”
Section: Dynamic Recommendation Fairnessmentioning
confidence: 99%
“…Fairness under Other Temporal Models A series of works focus on study fairness in the setting of online learning (Blum et al, 2018;Bechavod et al, 2019;Gupta and Kamble, 2019), multi-armed bandit (Joseph et al, 2016;Patil et al, 2020;, and one-step feedback model (Liu et al, 2018;Hu and Chen, 2018;Kannan et al, 2019). In particular, Hashimoto et al (2018) show that empirical risk minimization amplifies representation disparity over time with a low group retention rate for the underrepresented group.…”
Section: Related Workmentioning
confidence: 99%
“…Manshadi et al [39] studied fair online rationing such that each arriving agent can receive a fair share of resources proportional to its demand. The fairness issue has been studied in other domains/applications as well, see, e.g., online selection of candidates [40], influence maximization [41], banditbased online learning [42][43][44], online resource allocation [45,46], and classification [47].…”
Section: Other Related Workmentioning
confidence: 99%