“…These authors extended the original value-iteration bounds of MacQueen (1966) for the discounted cost case to the average cost case. The modified value-iteration algorithm with a dynamic relaxation factor comes from Popyack et al (1979). The first proof of the geometric convergence of the undiscounted value-iteration algorithm was given by White (1963) under a very strong recurrence condition.…”
Tijms, H. C.A first course in stochastic models / Henk C. Tijms. p. cm. Includes bibliographical references and index. ISBN 0-471-49880-7 (acid-free paper)-ISBN 0-471-49881-5 (pbk. : acid-free paper) 1. Stochastic processes. I. Title.
QA274.T46 2003 519.2 3-dc21 2002193371
British Library Cataloguing in Publication DataA catalogue record for this book is available from the British Library ISBN 0-471-49880-7 (Cloth) ISBN 0-471-49881-5 (Paper)
“…These authors extended the original value-iteration bounds of MacQueen (1966) for the discounted cost case to the average cost case. The modified value-iteration algorithm with a dynamic relaxation factor comes from Popyack et al (1979). The first proof of the geometric convergence of the undiscounted value-iteration algorithm was given by White (1963) under a very strong recurrence condition.…”
Tijms, H. C.A first course in stochastic models / Henk C. Tijms. p. cm. Includes bibliographical references and index. ISBN 0-471-49880-7 (acid-free paper)-ISBN 0-471-49881-5 (pbk. : acid-free paper) 1. Stochastic processes. I. Title.
QA274.T46 2003 519.2 3-dc21 2002193371
British Library Cataloguing in Publication DataA catalogue record for this book is available from the British Library ISBN 0-471-49880-7 (Cloth) ISBN 0-471-49881-5 (Paper)
“…we see that if 7yk = 1 for all k, the new value iteration (7)-(8) becomes similar to the known value iteration (9)-(10): the updating formulas are the same in both methods, but the order of updating A is just reversed relatively to the order of updating h. We note that there is also a variant of the standard method (9)-(10) that involves interpolations between hk and hk+1 according to a stepsize parameter (see [Sch71], [Pla77], [Var78], [PBW79], [Put94], [Ber95]). However, the new method does not seem as closely related to this variant.…”
Section: Furthermore A* Together With a Differential Cost Vector H =mentioning
We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem.
“…This can be done by linear programming (LP) [3], [8], [12], [17], value iteration [4], [7], [9], [16], [18], [23], [25] or policy iteration [9], [10], [11]. Policy iteration was first proposed by Howard [9].…”
Section: •• •A(n)}ek:= K(i) X K(2) X•• • X K(n)mentioning
Given a family of Markov chains with a single recurrent class, we present a potential application of Schweitzer's exact formula relating the steady-state probability and fundamental matrices of any two chains in the family. We propose a new policy iteration scheme for Markov decision processes where in contrast to policy iteration, the new criterion for selecting an action ensures the maximal one-step average cost improvement. Its computational complexity and storage requirement are analysed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.