Wesley Cowan scite author profile

Wesley Cowan

5Publications

43Citation Statements Received

100Citation Statements Given

How they've been cited

How they cite others

100

Affiliations

Rutgers, The State University of New Jersey

Publications

Order By: Most citations

Multi-Armed Bandits Under General Depreciation and Commitment

Cowan

Katehakis

2014

Prob. Eng. Inf. Sci.

View full text Add to dashboard Cite

Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection of independent processes (statistical experiments, populations, etc.) for a single period, receiving a reward that is a function of the activated process, and in doing so advancing the chosen process. Classically, rewards are discounted by a constant factor β ∈ (0, 1) per round.In this paper, we present a solution to the problem, with potentially non-Markovian, uncountable state space reward processes, under a framework in which, first, the discount factors may be non-uniform and vary over time, and second, the periods of activation of each bandit may be not be fixed or uniform, subject instead to a possibly stochastic duration of activation before a change to a different bandit is allowed. The solution is based on generalized restart-in-state indices, and it utilizes a view of the problem not as "decisions over state space" but rather "decisions over time".

show abstract

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Cowan¹,

Honda²,

Katehakis³

2015

Preprint

View full text Add to dashboard Cite

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Cowan¹,

Katehakis²

2015

Preprint

View full text Add to dashboard Cite

Consider the problem of a controller sampling sequentially from a finite number of N ≥ 2 populations, specified by random variables X i k , i = 1, . . . , N, and k = 1, 2, . . .; where X i k denotes the outcome from population i the k th time it is sampled. It is assumed that for each fixed i, {X i k } k≥1 is a sequence of i.i.d. uniform random variables over some interval [ai, bi], with the support (i.e., ai, bi) unknown to the controller. The objective is to have a policy π for deciding, based on available data, from which of the N populations to sample from at any time n = 1, 2, . . . so as to maximize the expected sum of outcomes of n samples or equivalently to minimize the regret due to lack on information of the parameters {ai} and {bi}. In this paper, we present a simple UCB-type policy that is asymptotically optimal. Additionally, finite horizon regret bounds are given.

show abstract

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

Cowan¹,

Katehakis²

2015

Preprint

View full text Add to dashboard Cite

Exploration–exploitation Policies With Almost Sure, Arbitrarily Slow Growing Asymptotic Regret

Cowan

Katehakis

2019

Prob. Eng. Inf. Sci.

View full text Add to dashboard Cite

The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or "pseudo") regret associated with two simple classes of allocation policies π.For any slowly increasing function g, subject to mild regularity constraints, we construct two policies (the g-Forcing, and the g-Inflated Sample Mean) that achieve a measure of regret of order O(g(n)) almost surely as n → ∞, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the function g effectively controls the "exploration" of the classical "exploration/exploitation" tradeoff.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wesley Cowan

Multi-Armed Bandits Under General Depreciation and Commitment

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

Exploration–exploitation Policies With Almost Sure, Arbitrarily Slow Growing Asymptotic Regret

Contact Info

Product

Resources

About