Multi-Armed Bandits With Correlated Arms

Gupta, Samarth; Chaudhari, Shreyas; Joshi, Gauri; Yağan, Osman

doi:10.1109/tit.2021.3081508

Cited by 24 publications

(17 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The vaccination strategy is kept fixed over all simulation days. We define a vaccination strategy as a quintuple of the different vaccine types, divided over 5 age groups: Children (0-4), Youngsters (5-18), Young Adults (19)(20)(21)(22)(23)(24)(25), Adults (26-64) and Elderly (65+).…”

Section: Vaccine Allocationmentioning

confidence: 99%

Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration

Cimpean¹,

Verstraeten²,

Willem³

et al. 2023

Preprint

View full text Add to dashboard Cite

Individual-based epidemiological models support the study of fine-grained preventive measures, such as tailored vaccine allocation policies, in silico. As individual-based models are computationally intensive, it is pivotal to identify optimal strategies within a reasonable computational budget. Moreover, due to the high societal impact associated with the implementation of preventive strategies, uncertainty regarding decisions should be communicated to policy makers, which is naturally embedded in a Bayesian approach.We present a novel technique for evaluating vaccine allocation strategies using a multi-armed bandit framework in combination with a Bayesian anytime m-top exploration algorithm. m-top exploration allows the algorithm to learn m policies for which it expects the highest utility, enabling experts to inspect this small set of alternative strategies, along with their quantified uncertainty. The anytime component provides policy advisors with flexibility regarding the computation time and the desired confidence, which is important as it is difficult to make this trade-off beforehand.We consider the Belgian COVID-19 epidemic using the individualbased model STRIDE, where we learn a set of vaccination policies that minimize the number of infections and hospitalisations. Through experiments we show that our method can efficiently identify the m-top policies, which is validated in a scenario where the ground truth is available. Finally, we explore how vaccination policies can best be organised under different contact reduction schemes. Through these experiments, we show that the top policies follow a clear trend regarding the prioritised age groups and assigned vaccine type, which provides insights for future vaccination campaigns.

show abstract

Section: Vaccine Allocationmentioning

confidence: 99%

Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration

Cimpean¹,

Verstraeten²,

Willem³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Some algorithms are proved instance-optimal for specific interactive decision making problems. Variants of UCB algorithms are instance-optimal for bandits with various assumptions [Lattimore and Szepesvári, 2020, Gupta et al, 2021, Tirinzoni et al, 2020, Degenne et al, 2020, Magureanu et al, 2014, but are suboptimal for linear bandits [Lattimore and Szepesvari, 2017]. These algorithms rely on the optimism-in-face-of-uncertainty principle to deal with exploration-exploitation tradeoff, whereas our algorithm explicitly finds the best tradeoff.…”

Section: Additional Related Workmentioning

confidence: 99%

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Dong¹,

Ma²

2022

Preprint

View full text Add to dashboard Cite

Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt to the complexity of a particular problem instance and incur smaller regrets on easy instances than worst-case instances. In this paper, we design the first asymptotic instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions. On every instance f , our algorithm outperforms all consistent algorithms (those achieving non-trivial regrets on all instances), and has asymptotic regret C(f ) ln n, where C(f ) is an exact characterization of the complexity of f . The key step of the algorithm involves hypothesis testing with active data collection. It computes the most economical decisions with which the algorithm collects observations to test whether an estimated instance is indeed correct; thus, the complexity C(f ) is the minimum cost to test the instance f against other instances. Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits [Lai and Robbins, 1985] and prior works on linear bandits [Lattimore and Szepesvari, 2017], and improve upon the previous best instance-dependent upper bound [Xu et al., 2021] for reinforcement learning.

show abstract

“…Alternatively, in a PUT that has a limit on the length of inputs to be accepted, mutations such as inserting a constant string or copying a partial byte sequence from another seed are not promising arms. While the distributions of rewards are assumed to be independent in standard stochastic bandit problems, there are studies on such problem settings where the arms are correlated, aimed at reducing regret further compared to the standard settings [30,33,55]. Even assuming independence, bandit algorithms can greatly improve the efficiency of the fuzzer.…”

Section: Correlation Of Armsmentioning

confidence: 99%

SLOPT: Bandit Optimization Framework for Mutation-Based Fuzzing

Koike¹,

Katsura²,

Yakura³

et al. 2022

Proceedings of the 38th Annual Computer Security Applications Conference

View full text Add to dashboard Cite

Mutation-based fuzzing has become one of the most common vulnerability discovery solutions over the last decade. Fuzzing can be optimized when targeting specific programs, and given that, some studies have employed online optimization methods to do it automatically, i.e., tuning fuzzers for any given program in a program-agnostic manner. However, previous studies have neither fully explored mutation schemes suitable for online optimization methods, nor online optimization methods suitable for mutation schemes. In this study, we propose an optimization framework called SLOPT that encompasses both a bandit-friendly mutation scheme and mutation-scheme-friendly bandit algorithms. The advantage of SLOPT is that it can generally be incorporated into existing fuzzers, such as AFL and Honggfuzz. As a proof of concept, we implemented SLOPT-AFL++ by integrating SLOPT into AFL++ and showed that the program-agnostic optimization delivered by SLOPT enabled SLOPT-AFL++ to achieve higher code coverage than AFL++ in all of ten real-world FuzzBench programs. Moreover, we ran SLOPT-AFL++ against several real-world programs from OSS-Fuzz and successfully identified three previously unknown vulnerabilities, even though these programs have been fuzzed by AFL++ for a considerable number of CPU days on OSS-Fuzz. CCS CONCEPTS• Security and privacy → Software security engineering; • Theory of computation → Sequential decision making.

show abstract

Multi-Armed Bandits With Correlated Arms

Cited by 24 publications

References 21 publications

Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration

Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

SLOPT: Bandit Optimization Framework for Mutation-Based Fuzzing

Contact Info

Product

Resources

About