2022
DOI: 10.1287/opre.2021.2100
|View full text |Cite
|
Sign up to set email alerts
|

Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs

Abstract: Many online service platforms have dedicated algorithms to match their available resources to incoming clients to maximize client satisfaction. One of the key challenges is to balance the generation of higher payoffs from existing clients and exploration of new clients’ unknown characteristics while at the same time satisfy the resource capacity constraints. In “Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs,” Hsu, Xu, Lin, and Bell show that traditional approaches s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(25 citation statements)
references
References 21 publications
0
25
0
Order By: Relevance
“…Then, we carefully incorporate the regret analysis for classical UCB algorithm (e.g., [5]) into our analysis. The analysis is similar to the line of regret analysis in [28] and [12], and is available in Appendix B.…”
Section: T) and The Cumulative Regret Reg(t )mentioning
confidence: 99%
“…Then, we carefully incorporate the regret analysis for classical UCB algorithm (e.g., [5]) into our analysis. The analysis is similar to the line of regret analysis in [28] and [12], and is available in Appendix B.…”
Section: T) and The Cumulative Regret Reg(t )mentioning
confidence: 99%
“…Our algorithm randomly assigns jobs to servers in each time step with the expected number of assignments of jobs of a class to servers of a class determined by the solution of a system optimization problem, whose objective function consists of a fairness of allocation term and a reward maximization term. This dynamic allocation method follows the framework proposed in Hsu et al (2021) with the system optimization problem definition following the utility maximization framework studied in the context of network resource allocation. In our algorithm, the reward maximization part of the system optimization problem objective uses mean reward estimates computed by following an optimism in the face of uncertainty strategy developed for linear bandits in Abbasi-Yadkori et al (2011), which we adapted to our setting.…”
Section: Our Contributionsmentioning
confidence: 99%
“…A closely related queueing system control problem was studied in Hsu et al (2021) with key differences in that we consider systems where the scheduler has access to features of jobs and servers and rewards of job-server assignments follow a bilinear model, and we also consider more general cases in which mean job service times are allowed to be different across job classes and the set of server classes can be time varying. The bilinear structure of rewards allows us to design algorithms that extend learning to be over job classes, which is different from the approach in Hsu et al (2021) where the learning is for each job separately. As a result, we obtain better regret bounds which scale sub-linearly rather than linearly with the time horizon T and remove the dependence on the number of server classes J by exploiting the structure of rewards.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations