Reinforcement learning (RL) is a promising tool for solving robust optimal well control problems where the model parameters are highly uncertain and the system is partially observable in practice. However, the RL of robust control policies often relies on performing a large number of simulations. This could easily become computationally intractable for cases with computationally intensive simulations. To address this bottleneck, an adaptive multigrid RL framework is introduced which is inspired by principles of geometric multigrid methods used in iterative numerical algorithms. RL control policies are initially learned using computationally efficient low-fidelity simulations with coarse grid discretization of the underlying partial differential equations (PDEs). Subsequently, the simulation fidelity is increased in an adaptive manner towards the highest fidelity simulation that corresponds to the finest discretization of the model domain. The proposed framework is demonstrated using a state-of-the-art, model-free policy-based RL algorithm, namely the proximal policy optimization algorithm. Results are shown for two case studies of robust optimal well control problems, which are inspired from SPE-10 model 2 benchmark case studies. Prominent gains in computational efficiency are observed using the proposed framework, saving around 60-70% of the computational cost of its single fine-grid counterpart.
Reinforcement learning (RL) is a promising method to solve control problems (Dixit and ElSheikh, 2022). However, model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies. A major source of computational cost in RL corresponds to the transition function, which is dictated by the model dynamics. This is especially problematic when model dynamics is represented with coupled PDEs. In such cases, the transition function often involves solving a large-scale discretization of the said PDEs. We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization (i.e. multilevel models). This is done by formulating an approximate multilevel Monte Carlo estimate (inspired by Giles ( 2015)) of the objective function of the policy and / or value network instead of Monte Carlo estimates, as done in the classical framework. As a demonstration of this framework, we present a multilevel version of the proximal policy optimization (PPO) algorithm. Here, the level refers to the grid fidelity of the chosen simulation-based environment. We provide two examples of simulation-based environments that employ stochastic PDEs that are solved using finite-volume discretization. For the case studies presented, we observed substantial computational savings using multilevel PPO compared to its classical counterpart.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.