We show that if performance measures in stochastic and dynamic scheduling problems satisfy generalized conservation laws, then the feasible region of achievable performance is a polyhedron called an extended polymatroid, that generalizes the classical polymatroids introduced by Edmonds. Optimization of a linear objective over an extended polymatroid is solved by an adaptive greedy algorithm, which leads to an optimal solution having an indexability property (indexable systems). Under a certain condition the indices possess a stronger decomposition property (decomposable systems). The following problems can be analyzed using our theory: multiarmed bandit problems, branching bandits, scheduling of multiclass queues (with or without feedback), scheduling of a batch of jobs. Consequences of our results include: (1) a characterization of indexable systems as systems that satisfy generalized conservation laws, (2) a sufficient condition for indexable systems to be decomposable, (3) a new linear programming proof of the decomposability property of Gittins indices in multiarmed bandit problems, (4) an approach to sensitivity analysis of indexable systems, (5) a characterization of the indices of indexable systems as sums of dual variables, and an economic interpretation of the branching bandit indices in terms of retirement options, (6) an analysis of the indexability of undiscounted branching bandits, (7) a new algorithm to compute the indices of indexable systems (in particular Gittins indices), as fast as the fastest known algorithm, (8) a unification of Klimov's algorithm for multiclass queues and Gittms' algorithm for multiarmed bandits as special cases of the same algorithm, (9) a closed formula for the maximum reward of the multiarmed bandit problem, with a new proof of its submodularity and (10) an understanding of the invariance of the indices with respect to some parameters of the problem. Our approach provides a polyhedral treatment of several classical problems in stochastic and dynamic scheduling and is able to address variations such as: discounted versus undiscounted cost criterion, rewards versus taxes, discrete versus continuous time, and linear versus nonlinear objective functions.
We develop a mathematical programming approach for the classical P S P A C E , hard restless bandit problem in stochastic optimization. We introduce a hierarchy o f n where n is the number of bandits increasingly stronger linear programming relaxations, the last of which is exact and corresponds to the exponential size formulation of the problem as a Markov decision chain, while the other relaxations provide bounds and are e ciently computed. We also propose a priority-index heuristic scheduling policy from the solution to the rst-order relaxation, where the indices are de ned in terms of optimal dual variables. In this way w e propose a policy and a suboptimality guarantee. We report results of computational experiments that suggest that the proposed heuristic policy is nearly optimal. Moreover, the second-order relaxation is found to provide strong bounds on the optimal value.
This paper develops a polyhedral approach to the design, analysis, and computation of dynamic allocation indices for scheduling binary-action (engage/rest) Markovian stochastic projects which can change state when rested (restless bandits (RBs)), based on partial conservation laws (PCLs). This extends previous work by the author [J. : Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33,, where PCLs were shown to imply the optimality of index policies with a postulated structure in stochastic scheduling problems, under admissible linear objectives, and they were deployed to obtain simple sufficient conditions for the existence of Whittle's (1988) RB index (indexability), along with an adaptive-greedy index algorithm. The new contributions include: (i) we develop the polyhedral foundation of the PCL framework, based on the structural and algorithmic properties of a new polytope associated with an accessible set system (J, F ) (F -extended polymatroid); (ii) we present new dynamic allocation indices for RBs, motivated by an admission control model, which extend Whittle's and have a significantly increased scope; (iii) we deploy PCLs to obtain both sufficient conditions for the existence of the new indices (PCL-indexability), and a new adaptive-greedy index algorithm; (iv) we interpret PCL-indexability as a form of the classic economics law of diminishing marginal returns, and characterize the index as an optimal marginal cost rate; we further solve a related optimal constrained control problem; (v) we carry out a PCL-indexability analysis of the motivating admission control model, under time-discounted and long-run average criteria; this gives, under mild conditions, a new index characterization of optimal threshold policies; and (vi) we apply the latter to present new heuristic index policies for two hard queueing control problems: admission control and routing to parallel queues; and scheduling a multiclass make-to-stock queue with lost sales, both under state-dependent holding cost rates and birth-death dynamics.
Priority allocation, Stochastic scheduling, Index policies, Restless bandits, Marginal productivity index, Indexability, Dynamic control of queues, Control by price, 90B36, 90C40, 90B05, 90B22, 90B18,
We show that if performance measures in a stochastic scheduling problem satisfy a set of so-called partial conservation laws (PCL), which extend previously studied generalized conservation laws (GCL), then the problem is solved optimally by a priority-index policy for an appropriate range of linear performance objectives, where the optimal indices are computed by a one-pass adaptive-greedy algorithm, based on Klimov's. We further apply this framework to investigate the indexability property of restless bandits introduced by Whittle, obtaining the following results: (1) we identify a class of restless bandits (PCL-indexable) which are indexable; membership in this class is tested through a single run of the adaptive-greedy algorithm, which also computes the Whittle indices when the test is positive; this provides a tractable sufficient condition for indexability; (2) we further identify the class of GCL-indexable bandits, which includes classical bandits, having the property that they are indexable under any linear reward objective. The analysis is based on the so-called achievable region method, as the results follow from new linear programming formulations for the problems investigated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.