Anmol Kagrecha scite author profile

Traditional multi-armed bandit (MAB) formulations usually make certain assumptions about the underlying arms' distributions, such as bounds on the support or their tail behaviour. Moreover, such parametric information is usually 'baked' into the algorithms. In this paper, we show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified. Our key contributions are twofold: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixedbudget pure exploration setting, and (ii) We propose two classes of algorithms that are asymptotically near-optimal. Additionally, we consider a risk-aware criterion for best arm identification, where the objective associated with each arm is a linear combination of the mean and the conditional value at risk (CVaR). Throughout, we make a very mild 'bounded moment' assumption, which lets us work with both light-tailed and heavy-tailed distributions within a unified framework.

show abstract

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Kagrecha

Nair

Jagannathan

2022

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

“Please Come Back Later”: Benefiting from Deferrals in Service Systems

Kagrecha

Nair

2020

View full text Add to dashboard Cite

The performance evaluation of loss service systems, where customers who cannot be served upon arrival get dropped, has a long history going back to the classical Erlang B model. In this paper, we consider the performance benefits arising from the possibility of deferring customers who cannot be served upon arrival. Specifically, we consider an Erlang B type loss system where the system operator can, subject to certain constraints, ask a customer arriving when all servers are busy, to come back at a specified time in the future. If the system is still fully loaded when the deferred customer returns, she gets dropped for good. For such a system, we ask: How should the system operator determine the 'rearrival' times of the deferred customers based on the state of the system (which includes those customers already deferred and yet to arrive)? How does one quantify the performance benefit of such a deferral policy?Our contributions are as follows. We propose a simple statedependent policy for determining the rearrival times of deferred customers. For this policy, we characterize the long run fraction of customers dropped. We also analyse a relaxation where the deferral times are bounded in expectation. Via extensive numerical evaluations, we demonstrate the superiority of the proposed state-dependent policies over naive state-independent deferral policies.

show abstract

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar¹,

Nair²,

Kagrecha³

et al. 2020

Preprint

View full text Add to dashboard Cite

We study regret minimization in a stochastic multi-armed bandit setting, and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms' distributions, we show that bandit learning algorithms with logarithmic regret are always inconsistent, and that consistent learning algorithms always suffer a superlogarithmic regret. This result highlights the inevitable statistical fragility of all 'logarithmic regret' bandit algorithms available in the literature-for instance, if a UCB algorithm designed for σ-subGaussian distributions is used in a subGaussian setting with a mismatched variance parameter, the learning performance could be inconsistent. Next, we show a positive result: statistically robust and consistent learning performance is attainable if we allow the regret to be slightly worse than logarithmic. Specifically, we propose three classes of distribution oblivious algorithms that achieve an asymptotic regret that is arbitrarily close to logarithmic.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anmol Kagrecha

Constrained regret minimization for multi-criterion multi-armed bandits

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

“Please Come Back Later”: Benefiting from Deferrals in Service Systems

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Contact Info

Product

Resources

About