2012
DOI: 10.1007/978-1-4614-4109-0_8
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

Abstract: We consider the problem of sequential sampling from a finite number of independent statistical populations to maximize the expected infinite horizon average outcome per period, under a constraint that the expected average sampling cost does not exceed an upper bound. The outcome distributions are not known. We construct a class of consistent adaptive policies, under which the average outcome converges with probability 1 to the true value under complete information for all distributions with finite means. We al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…, k, where the randomization probabilities x j are an optimal solution to the above linear program LP(θ), cf. Burnetas and Kanavetas [7], Burnetas and Katehakis [12]. However, such policy may not be feasible in our framework that requires C π (n)/n ≤ c 0 , ∀n = 1, 2, .…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…, k, where the randomization probabilities x j are an optimal solution to the above linear program LP(θ), cf. Burnetas and Kanavetas [7], Burnetas and Katehakis [12]. However, such policy may not be feasible in our framework that requires C π (n)/n ≤ c 0 , ∀n = 1, 2, .…”
Section: Preliminariesmentioning
confidence: 99%
“…Tran-Thanh et al [47], considered the problem when the cost of activation of each arm is fixed and becomes known after the arm is used once. Burnetas and Kanavetas [7] considered a version of this problem and constructed a consistent policy (i.e., with regret R π (n) = o(n), as n → ∞). In the present paper, we employ a stricter version of the average cost constraint that requires the average sampling cost not to exceed c 0 at any time period and not only in the limit.…”
Section: (13)mentioning
confidence: 99%
“…Further, the class of block-UCB (b-UCB) feasible policies which are developed here and achieve the asymptotic lower bound in the regret have a simpler form and are easier to compute than those in Burnetas et al (2017). We also refer to Burnetas and Kanavetas (2012) where a consistent policy (i.e., with regret o(n)) for the case of a single linear constraint was constructed.…”
Section: Introductionmentioning
confidence: 99%