The interaction of multiple autonomous agents gives rise to highly dynamic and nondeterministic environments, contributing to the complexity in applications such as automated financial markets, smart grids, or robotics. Due to the sheer number of situations that may arise, it is not possible to foresee and program the optimal behaviour for all agents beforehand. Consequently, it becomes essential for the success of the system that the agents can learn their optimal behaviour and adapt to new situations or circumstances. The past two decades have seen the emergence of reinforcement learning, both in single and multi-agent settings, as a strong, robust and adaptive learning paradigm. Progress has been substantial, and a wide range of algorithms are now available. An important challenge in the domain of multi-agent learning is to gain qualitative insights into the resulting system dynamics. In the past decade, tools and methods from evolutionary game theory have been successfully employed to study multi-agent learning dynamics formally in strategic interactions. This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively. Furthermore, new learning algorithms that have been introduced using these evolutionary game theoretic tools are reviewed. The evolutionary models can be used to study complex strategic interactions. Examples of such analysis are given for the domains of automated trading in stock markets and collision avoidance in multi-robot systems. The paper provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi-agent learning by highlighting the main results and accomplishments.
We introduce α - Rank , a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov - Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α -Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model’s ranking-intensity parameter, α , is chosen to be large, which exactly forms the basis of α -Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α -Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α -Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.
In this paper, we investigate Reinforcement learning (RL) in multi-agent systems (MAS) from an evolutionary dynamical perspective. Typical for a MAS is that the environment is not stationary and the Markov property is not valid. This requires agents to be adaptive. RL is a natural approach to model the learning of individual agents. These Learning algorithms are however known to be sensitive to the correct choice of parameter settings for single agent systems. This issue is more prevalent in the MAS case due to the changing interactions amongst the agents. It is largely an open question for a developer of MAS of how to design the individual agents such that, through learning, the agents as a collective arrive at good solutions. We will show that modeling RL in MAS, by taking an evolutionary game theoretic point of view, is a new and potentially successful way to guide learning agents to the most suitable solution for their task at hand. We show how evolutionary dynamics (ED) from Evolutionary Game Theory can help the developer of a MAS in good choices of parameter settings of the used RL algorithms. The ED essentially predict the equilibriums outcomes of the MAS where the agents use individual RL algorithms. More specifically, we show how the ED predict the learning trajectories of Q-Learners for iterated games. Moreover, we apply our results to (an extension of) the COllective INtelligence framework (COIN). COIN is a proved engineering approach for learning of cooperative tasks in MASs. The utilities of the agents are re-engineered to contribute to the global utility. We show how the improved results for MAS RL in COIN, and a developed extension, are predicted by the ED.
Swarms of tiny flying robots hold great potential for exploring unknown, indoor environments. Their small size allows them to move in narrow spaces, and their light weight makes them safe for operating around humans. Until now, this task has been out of reach due to the lack of adequate navigation strategies. The absence of external infrastructure implies that any positioning attempts must be performed by the robots themselves. State-of-the-art solutions, such as simultaneous localization and mapping, are still too resource demanding. This article presents the swarm gradient bug algorithm (SGBA), a minimal navigation solution that allows a swarm of tiny flying robots to autonomously explore an unknown environment and subsequently come back to the departure point. SGBA maximizes coverage by having robots travel in different directions away from the departure point. The robots navigate the environment and deal with static obstacles on the fly by means of visual odometry and wall-following behaviors. Moreover, they communicate with each other to avoid collisions and maximize search efficiency. To come back to the departure point, the robots perform a gradient search toward a home beacon. We studied the collective aspects of SGBA, demonstrating that it allows a group of 33-g commercial off-the-shelf quadrotors to successfully explore a real-world environment. The application potential is illustrated by a proof-of-concept search-and-rescue mission in which the robots captured images to find “victims” in an office environment. The developed algorithms generalize to other robot types and lay the basis for tackling other similarly complex missions with robot swarms in the future.
Micro Aerial Vehicles (MAVs) are very suitable for flying in indoor environments, but autonomous navigation is challenging due to their strict hardware limitations. This paper presents a highly efficient computer vision algorithm called Edge-FS for the determination of velocity and depth. It runs at 20 Hz on a 4 g stereo camera with an embedded STM32F4 microprocessor (168 MHz, 192 kB) and uses edge distributions to calculate optical flow and stereo disparity. The stereo-based distance estimates are used to scale the optical flow in order to retrieve the drone's velocity. The velocity and depth measurements are used for fully autonomous flight of a 40 g pocket drone only relying on on-board sensors. This method allows the MAV to control its velocity and avoid obstacles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.