Markov Decision Process Parallel Value Iteration Algorithm On GPU

Chen, Peng; Lu, Lu

doi:10.2991/isca-13.2013.51

Cited by 10 publications

(4 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[44] provides a proof of concept where the value iteration algorithm was mapped to a GPU platform using CUDA. Value iteration was also implemented using OpenCL for path finding problems in [45]. [46] uses OpenMP and MPI to implement a very large-scale MDP to generate traffic control tables.…”

Section: A Previous Progress In Optimal Path Planningmentioning

confidence: 99%

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Chowdhury,

Subramani

2021

Preprint

View full text Add to dashboard Cite

Autonomous marine vehicles play an essential role in many ocean science and engineering applications. Planning time and energy optimal paths for these vehicles to navigate in stochastic dynamic ocean environments is essential to reduce operational costs. In some missions, they must also harvest solar, wind, or wave energy (modeled as a stochastic scalar field) and move in optimal paths that minimize net energy consumption. Markov Decision Processes (MDPs) provide a natural framework for sequential decision making for robotic agents in such environments. However, building a realistic model and solving the modeled MDP becomes computationally expensive in large-scale real-time applications, warranting the need of parallel algorithms and efficient implementation. In the present work, we introduce an efficient end-to-end GPU-accelerated algorithm that (i) builds the MDP model (computing transition probabilities and expected one-step rewards); and (ii) solves the MDP to compute an optimal policy. We develop methodical and algorithmic solutions to overcome the limited global memory of GPUs by (i) using a dynamic reduced-order representation of the ocean flows, (ii) leveraging the sparse nature of the state transition probability matrix, (iii) introducing a neighbouring sub-grid concept and (iv) proving that it is sufficient to use only the stochastic scalar field's mean to compute the expected one-step rewards for missions involving energy harvesting from the environment; thereby saving memory and reducing the computational effort. We demonstrate the algorithm on a simulated stochastic dynamic environment and highlight that it builds the MDP model and computes the optimal policy 600-1000x faster than conventional CPU implementations, making it suitable for real-time use.

show abstract

Section: A Previous Progress In Optimal Path Planningmentioning

confidence: 99%

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Chowdhury,

Subramani

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…2 Related work Jóhannsson (2009) demonstrated that value iteration could be effectively run in parallel on a GPU soon after the introduction of CUDA. Subsequent research has evaluated the performance of GPU-accelerated value iteration on problems from economics and finance (Aamer et al 2020;Aldrich et al 2011;Duarte et al 2020;Kirkby 2017;Kirkby 2022) and route-finding and navigation (Chen and Lu 2013;Constantinescu et al 2020;Inamoto et al 2011;Ruiz and Hernández 2015). We have only identified a single study that applied this approach to an inventory control problem: Ortega et al (2019) implemented a custom value iteration algorithm in CUDA to find replenishment policies for a subset of perishable inventory problems originally described by Hendrix et al (2019).…”

Section: Introductionmentioning

confidence: 99%

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Farrington¹,

Li²,

Wong³

et al. 2023

Preprint

View full text Add to dashboard Cite

Value iteration can find the optimal replenishment policy for a perishable inventory problem, but is computationally demanding due to the large state spaces that are required to represent the age profile of stock. The parallel processing capabilities of modern GPUs can reduce the wall time required to run value iteration by updating many states simultaneously. The adoption of GPU-accelerated approaches has been limited in operational research relative to other fields like machine learning, in which new software frameworks have made GPU programming widely accessible. We used the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level API, and relied on this library's function transformations and compiler to efficiently utilize GPU hardware. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We demonstrate this on example scenarios from three recent studies which include problems with over 16 million states and additional problem features, such as substitution between products, that increase computational complexity. We compare the performance of the optimal replenishment policies to heuristic policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters on thousands of simulated years. The heuristic policies gave a maximum optimality gap of 2.49%. Our general approach may be applicable to a wide range of problems in operational research that would benefit from large-scale parallel computation on consumer-grade GPU hardware.

show abstract

“…Johannson [10] provides a proof of principle, where CUDA is used to map the VI process on a GPU platform. Chen and Lu [3] consider OpenCL for implementing a VI for a path finding problem. Herrera et al [9] use HPC to implement a very large-scale MDP for the generation of traffic control tables.…”

Section: Introductionmentioning

confidence: 99%

A CUDA approach to compute perishable inventory control policies using value iteration

2018

View full text Add to dashboard Cite

Dynamic programming (DP) approaches, in particular value iteration, is often seen as a method to derive optimal policies in inventory management. The challenge in this approach is to deal with an increasing state space when handling realistic problems. As a large part of world food production is thrown out due to its perishable character, a motivation exists to have a good look at order policies in retail. Recently, investigation has been introduced to consider substitution of one product by another, when one is out of stock. Taking this tendency into account in a policy requires an increasing state space. Therefore, we investigate the potential of using GPU platforms in order to derive optimal policies when the number of products taken into account simultaneously is increasing. First results show the potential of the GPU approach to accelerate computation in value iteration for DP.

show abstract

Markov Decision Process Parallel Value Iteration Algorithm On GPU

Cited by 10 publications

References 8 publications

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Optimal Path Planning of Autonomous Marine Vehicles in Stochastic Dynamic Ocean Flows using a GPU-Accelerated Algorithm

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

A CUDA approach to compute perishable inventory control policies using value iteration

Contact Info

Product

Resources

About