Air traffic flow management is one of the fundamental challenges facing the Federal Aviation Administration (FAA) today. The FAA estimates that in 2005 alone, there were over 322,000 hours of delays at a cost to the industry in excess of three billion dollars. Finding reliable and adaptive solutions to the flow management problem is of paramount importance if the Next Generation Air Transportation Systems are to achieve the stated goal of accommodating three times the current traffic volume. This problem is particularly complex as it requires the integration and/or coordination of many factors including: new data (e.g., changing weather info), potentially conflicting priorities (e.g., different airlines), limited resources (e.g., air traffic controllers) and very heavy traffic volume (e.g., over 40,000 flights over the US airspace).In this paper we use FACET -an air traffic flow simulator developed at NASA and used extensively by the FAA and industry -to test a multi-agent algorithm for traffic flow management. An agent is associated with a fix (a specific location in 2D space) and its action consists of setting the separation required among the airplanes going though that fix. Agents use reinforcement learning to set this separation and their actions speed up or slow down traffic to manage congestion. Our FACET based results show that agents receiving personalized rewards reduce congestion by up to 45% over agents receiving a global reward and by up to 67% over a current industry approach (Monte Carlo estimation).
The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(位)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents' reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents' movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards.
Evolutionary computation can be a powerful tool in cresting a control policy for a single agent receiving local continuous input. This paper extends single-agent evolutionary computation to multi-agent systems, where a collection of agents strives to maximize a global fitness evaluation function that rates the performance of the entire system. This problem is solved in a distributed manner, where each agent evolves its own population of neural networks that are used as the control policies for the agent. Each agent evolves its population using its own agent-specific fitness evaluation function. We propose to create these agent-specific evaluation functions using the theory of collectives to avoid the coordination problem where each agent evolves a population that maximizes its own fitness function, yet the system has a whole achieves low values of the global fitness function. Instead we will ensure that each fitness evaluation function is both "aligned" with the global evaluation function and is LLlearnable,'' i.e., the agents can readily see how their behavior affects their evaluation function. We then show how these agent-specific evaluation functions outperform global evaluation methods by up to 600% in a domain where a set of rovers attempt to maximize the amount of information observed while navigating through a simulated environment.
Abstract. Intelligent air traffic flow management is one of the fundamental challenges facing the Federal Aviation Administration (FAA) today. FAA estimates put weather, routing decisions and airport condition induced delays at 1,682,700 hours in 2007 [18], resulting in a staggering economic loss of over $41 Billion [42]. New solutions to the flow management are needed to accommodate the threefold increase in air traffic anticipated over the next two decades. Indeed, this is a complex problem where the interactions of changing conditions (e.g., weather), conflicting priorities (e.g., different airlines), limited resources (e.g., air traffic controllers) and heavy volume (e.g., over 40,000 flights over the US airspace) demand an adaptive and robust solution.In this paper we explore a multiagent algorithm where agents use reinforcement learning to reduce congestion through local actions. Each agent is associated with a fix (a specific location in 2D space) and has one of three actions: setting separation between airplanes, ordering ground delays or performing reroutes. We simulate air traffic using FACET which is an air traffic flow simulator developed at NASA and used extensively by the FAA and industry. Our FACET simulations on both artificial and real historical data from the Chicago and New York airspaces show that agents receiving personalized rewards reduce congestion by up to 80% over agents receiving a global reward and by up to 90% over a current industry approach (Monte Carlo estimation).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.