Toward an optimized value iteration algorithm for average cost Markov decision processes

Arruda, Edilson F.; Ourique, F.; Almudevar, Anthony

doi:10.1109/cdc.2010.5717895

Cited by 4 publications

(10 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Indeed, when a suitable decreasing rate is found, it can result in significant computational savings. However, a poor choice of decreasing may result in an inefficient algorithm, which can even be outperformed by standard value iteration [14]. In this paper we address this short-coming by introducing an algorithm that adaptively decreases the error sequence k  , and that results in a more robust algorithm, with more stable behavior that consistently outperforms standard value iteration.…”

Section: The Parameter Sequence K mentioning

confidence: 99%

“…The unknown rate of convergence renders the results in [13] not directly applicable for the studied problem. Earlier results, however, have shown that significant reduction on the overall computational effort can be attained by a suitable choice of refinement rate [14]. Unfortunately, such rate is now known a priori and the parameter tuning turns out to be very difficult.…”

Section: Introductionmentioning

confidence: 99%

“…These experiments are replications of the experiments presented in [14] and thus offer a ground for comparison. In the first experiment we solve a Queueing model with two classes of clients.…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…This renders the direct application of the results in [13] unpractical. Indeed, geometrically decreasing sequences k  where tried in [14], and promising results where obtained. The difficulty in such an approach lies in the fact that guessing the convergent rate a priori can be quite a daunting task.…”

Section: The Parameter Sequence K mentioning

confidence: 99%

“…Hence, the sequence k  can be freely selected from the class of convergent sequences in the interval   0,1 whose limit is nil. However, it is the form at which the convergent sequence goes to zero that will ultimately determine the behavior and, therefore, the computational effort, of the PIVI algorithm [13,14].…”

Section: The Parameter Sequence K mentioning

confidence: 99%

See 4 more Smart Citations

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Arruda¹,

Ourique²

2013

AJOR

Self Cite

View full text Add to dashboard Cite

This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters.

show abstract

Section: The Parameter Sequence K mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Numerical Experimentsmentioning

confidence: 99%

Section: The Parameter Sequence K mentioning

confidence: 99%

Section: The Parameter Sequence K mentioning

confidence: 99%

See 3 more Smart Citations

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Arruda¹,

Ourique²

2013

AJOR

Self Cite

View full text Add to dashboard Cite

show abstract

Reinforcement Learning for Scheduling Wireless Powered Sensor Communications

Li¹,

Tovar³

2019

IEEE Trans. on Green Commun. Netw.

View full text Add to dashboard Cite

Wireless Power Transfer and Data Collection in Wireless Sensor Networks

Duan

et al. 2018

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

In a rechargeable wireless sensor network, the data packets are generated by sensor nodes at a specific data rate, and transmitted to a base station. Moreover, the base station transfers power to the nodes by using Wireless Power Transfer (WPT) to extend their battery life. However, inadequately scheduling WPT and data collection causes some of the nodes to drain their battery and have their data buffer overflow, while the other nodes waste their harvested energy, which is more than they need to transmit their packets. In this paper, we investigate a novel optimal scheduling strategy, called EHMDP, aiming to minimize data packet loss from a network of sensor nodes in terms of the nodes' energy consumption and data queue state information. The scheduling problem is first formulated by a centralized MDP model, assuming that the complete states of each node are well known by the base station. This presents the upper bound of the data that can be collected in a rechargeable wireless sensor network. Next, we relax the assumption of the availability of full state information so that the data transmission and WPT can be semi-decentralized. The simulation results show that, in terms of network throughput and packet loss rate, the proposed algorithm significantly improves the network performance.

show abstract

Toward an optimized value iteration algorithm for average cost Markov decision processes

Cited by 4 publications

References 9 publications

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Reinforcement Learning for Scheduling Wireless Powered Sensor Communications

Wireless Power Transfer and Data Collection in Wireless Sensor Networks

Contact Info

Product

Resources

About