Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce "tabular linear functions" that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery-an optimization problem that combines inventory control and vehicle routing.
The development of Fault Detection and Identification (FDI) systems for complex mechatronic systems is a challenging process. Many quantitative and qualitative fault detection methods have been proposed in past literature. Few methods address multiple faults, instead an emphasis is placed on accurately proving a single fault exists. The omission of multiple faults regulates the capability of most fault detection methods. The Functional Failure Identification and Propagation (FFIP) framework has been utilized in past research for various applications related to fault propagation in complex systems. In this paper a Hierarchical Functional Fault Detection and Identification (HFFDI) system is proposed. The development of the HFFDI system is based on machine learning techniques, commonly used as a basis for FDI systems, and the functional system decomposition of the FFIP framework. The HFFDI is composed of a plant-wide FDI system and function-specific FDI systems. The HFFDI aims at fault identification in multiple fault scenarios using single fault data sets, when faults happen in different system functions. The methodology is applied to a case study of a generic nuclear power plant with 17 system functions. Compared with a plant-wide FDI system, in multiple fault scenarios the HFFDI gave better results for identifying one fault and also was able to identify more than one faults. The case study results show that in two fault scenarios the HFFDI was able to identify one of the faults with 79% accuracy and both faults with 13% accuracy. In three fault scenarios the HFFDI was able to identify one of the faults with 69% accuracy, two faults with 22% accuracy and all three faults with 1% accuracy.
Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent ``bar problem'', a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned on- or off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25% over learning using the global reward directly.
Congestion games offer a perfect environment in which to study the impact of local decisions on global utilities in multiagent systems. What is particularly interesting in such problems is that no individual action is intrinsically "good" or "bad" but that combinations of actions lead to desirable or undesirable outcomes. As a consequence, agents need to learn how to coordinate their actions with those of other agents, rather than learn a particular set of "good" actions. A congestion game can be studied from two different perspectives: (i) from the top down, where a global utility (e.g., a system-centric view of congestion) specifies the task to be achieved; or (ii) from the bottom up, where each agent has its own intrinsic utility it wants to maximize. In many cases, these two approaches are at odds with one another, where agents aiming to maximize their intrinsic utilities lead to poor values of a system level utility. In this paper we extend results on difference utilities, a form of shaped utility that enables multiagent learning in congested, noisy conditions, to study the global behavior that arises from the agents' choices in two types of congestion games. Our key result is that agents that aim to maximize a modified version of their own intrinsic utilities not only perform well in terms of the global utility, but also, on average perform better with respect to their own original utilities. In addition, we show that difference utilities are robust to agents "defecting" and using their own intrinsic utilities, and that performance degrades gracefully with the number of defectors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.