[1] Although being one of the most popular and extensively studied approaches to design water reservoir operations, Stochastic Dynamic Programming is plagued by a dual curse that makes it unsuitable to cope with large water systems: the computational requirement grows exponentially with the number of state variables considered (curse of dimensionality) and an explicit model must be available to describe every system transition and the associated rewards/costs (curse of modeling). A variety of simplifications and approximations have been devised in the past, which, in many cases, make the resulting operating policies inefficient and of scarce relevance in practical contexts. In this paper, a reinforcement-learning approach, called fitted Q-iteration, is presented: it combines the principle of continuous approximation of the value functions with a process of learning off-line from experience to design daily, cyclostationary operating policies. The continuous approximation, performed via tree-based regression, makes it possible to mitigate the curse of dimensionality by adopting a very coarse discretization grid with respect to the dense grid required to design an equally performing policy via Stochastic Dynamic Programming. The learning experience, in the form of a data set generated combining historical observations and model simulations, allows us to overcome the curse of modeling. Lake Como water system (Italy) is used as study site to infer general guidelines on the appropriate setting for the algorithm parameters and to demonstrate the advantages of the approach in terms of accuracy and computational effectiveness compared to traditional Stochastic Dynamic Programming.Citation: Castelletti, A., S. Galelli, M. Restelli, and R. Soncini-Sessa (2010), Tree-based reinforcement learning for optimal water reservoir operation, Water Resour. Res., 46, W09507,
The main objective of transfer in reinforcement learning is to reduce the complexity of learning the solution of a target task by effectively reusing the knowledge retained from solving a set of source tasks. In this paper, we introduce a novel algorithm that transfers samples (i.e., tuples s, a, s , r ) from source to target tasks. Under the assumption that tasks have similar transition models and reward functions, we propose a method to select samples from the source tasks that are mostly similar to the target task, and, then, to use them as input for batch reinforcementlearning algorithms. As a result, the number of samples an agent needs to collect from the target task to learn its solution is reduced. We empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity, even when some source tasks are significantly different from the target task.
Some problems in physics can be handled only after a suitable ansatz solution has been guessed.Such method is therefore resilient to generalization, resulting of limited scope. The coherent transport by adiabatic passage of a quantum state through an array of semiconductor quantum dots provides a par excellence example of such approach, where it is necessary to introduce its so called counter-intuitive control gate ansatz pulse sequence. Instead, deep reinforcement learning technique has proven to be able to solve very complex sequential decision-making problems involving competition between short-term and long-term rewards, despite a lack of prior knowledge. We show that in the above problem deep reinforcement learning discovers control sequences outperforming the ansatz counter-intuitive sequence. Even more interesting, it discovers novel strategies when realistic disturbances affect the ideal system, with better speed and fidelity when energy detuning between the ground states of quantum dots or dephasing are added to the master equation, also mitigating the effects of losses. This method enables online update of realistic systems as the policy convergence is boosted by exploiting the prior knowledge when available. Deep reinforcement learning proves effective to control dynamics of quantum states, and more generally it applies whenever an ansatz solution is unknown or insufficient to effectively treat the problem.
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.
[1] The operation of large-scale water resources systems often involves several conflicting and noncommensurable objectives. The full characterization of tradeoffs among them is a necessary step to inform and support decisions in the absence of a unique optimal solution. In this context, the common approach is to consider many single objective problems, resulting from different combinations of the original problem objectives, each one solved using standard optimization methods based on mathematical programming. This scalarization process is computationally very demanding as it requires one optimization run for each trade-off and often results in very sparse and poorly informative representations of the Pareto frontier. More recently, bio-inspired methods have been applied to compute an approximation of the Pareto frontier in one single run. These methods allow to acceptably cover the full extent of the Pareto frontier with a reasonable computational effort. Yet, the quality of the policy obtained might be strongly dependent on the algorithm tuning and preconditioning. In this paper we propose a novel multiobjective Reinforcement Learning algorithm that combines the advantages of the above two approaches and alleviates some of their drawbacks. The proposed algorithm is an extension of fitted Q-iteration (FQI) that enables to learn the operating policies for all the linear combinations of preferences (weights) assigned to the objectives in a single training process. The key idea of multiobjective FQI (MOFQI) is to enlarge the continuous approximation of the value function, that is performed by single objective FQI over the state-decision space, also to the weight space. The approach is demonstrated on a real-world case study concerning the optimal operation of the HoaBinh reservoir on the Da river, Vietnam. MOFQI is compared with the reiterated use of FQI and a multiobjective parameterization-simulationoptimization (MOPSO) approach. Results show that MOFQI provides a continuous approximation of the Pareto front with comparable accuracy as the reiterated use of FQI. MOFQI outperforms MOPSO when no a priori knowledge on the operating policy shape is available, while produces slightly less accurate solutions when MOPSO can exploit such knowledge.Citation: Castelletti, A., F. Pianosi, and M. Restelli (2013), A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run, Water Resour. Res., 49,[3476][3477][3478][3479][3480][3481][3482][3483][3484][3485][3486]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.