Packet routing is a fundamental problem in wireless networks in which routers decide the next hop for each packet in order to deliver it to its destination as quickly as possible. In order to overcome the shortcoming of non optimal forwarding path caused by fixed forwarding mode in geographic location-based routing algorithm, we investigated a new efficient packet routing strategy which combined with Deep Reinforcement Learning (DRL) algorithm (Proximal Policy Optimization: PPO) to minimize the hops and reduce the probability of encountering "routing hole" during the forwarding of the packets in complex networks. Each node in our network can make routing decisions to learn a routing policy that can not only use the greedy mode to maintain the efficiency of routing policy but also reduce the use of peripheral forwarding mode. The learning process of the DRL agent is based on the status information of the packet being transmitted and information of neighbor nodes within the communication range. Extensive simulations over dynamic network topologies and the number of nodes have shown that our packets routing agent can learn the optimal policy in terms of average packets delivery rate and the number of hops compared with Greedy Perimeter Stateless Routing (GPSR) protocol. The performance of PPO in large-scale action space is also verified, which provides a basis for the future researches to combine PPO technique with packet routing optimization.
The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network. In the proposed algorithm, agents hold local estimations of the global state and joint actions and update them with local observations and the messages received from neighbors. Under the hypothesis of the global value decomposition, the gradient of the global objective function to an individual agent is derived. The convergence of the proposed algorithm with linear function approximation is guaranteed according to the stochastic approximation theory. In the experiments, the proposed algorithm was applied to a passive location task multiagent environment and achieved superior performance compared to state-of-the-art algorithms.
Aiming at anti Unmanned Aerial Vehicle (UAV) swarm, this paper studies the detection and suppression mechanisms of emergence in cooperative flight. Cooperative fly is one of the critical operations for UAV swarm in both military and civilian utilities, which allows individual UAVs to distributed adjust their velocity to head for a common destination as well as avoid a collision. This process is viewed as the emergence of complex systems. An emergence detection algorithm based on double thresholds is proposed. It simultaneously monitors the cooperative flight process and system connectivity to accurately identify the occurrence, achievement, or failure of cooperative fly, which provides a solid prerequisite for the suppression mechanism. For suppression, in-band radio interference is designed under the constraint of average power, and the effect is modeled from the perspective of degrading the communication performance of the target system. It is found that low-intensity continuous interference can effectively delay the cooperative fly process and has better concealment, while medium-intensity continuous interference can rapidly stop that process. Based on the above analysis, for the first time, two countermeasures for the UAV swarm’s cooperative fly are designed for the operation intent of delay and disruption of the target UAC swarms, respectively. Simulation results show the effectiveness of the countermeasures.
Passive location systems receive electromagnetic waves at one or multiple base stations to locate the transmitters, which are widely used in security fields. However, the geometric configurations of stations can greatly affect the positioning precision. In the literature, the geometry of the passive location system is mainly designed based on empirical models. These empirical models, being hard to track the sophisticated electromagnetic environment in the real world, result in suboptimal geometric configurations and low positioning precision. In order to master the characteristics of complicated electromagnetic environments to improve positioning performance, this paper proposes a novel geometry optimization method based on multiagent reinforcement learning. In the proposed method, agents learn to optimize the geometry cooperatively by factorizing team value function into agentwise value functions. To facilitate cooperation and deal with data transmission challenges, a constraint is imposed on the data sent from the central station to vice stations to ensure conciseness and effectiveness of communications. According to the empirical results under direct position determination systems, agents can find better geometric configurations than the existing methods in complicated electromagnetic environments.
Reinforcement learning has recently made great progress in various challenging domains such as board game of Go and MOBA game of StarCraft II. Policy gradient based reinforcement learning method has become the mainstream due to its effectiveness and simplicity both in discrete and continuous scenarios. However, policy gradient methods commonly involve function approximation and work in an on-policy fashion, which leads to high variance and low sample efficiency. This paper introduces a novel policy gradient method to improve the sample efficiency via a pair of trajectory based prioritized replay buffers and reduce the variance in training with a target network whose weights are updated in a ''soft'' manner. We evaluate our method on the reinforcement learning suit of Open AI Gym tasks, and the results show that the proposed method can learn more steadily and achieve higher performance than existing methods.INDEX TERMS Reinforcement learning, policy gradient, replay buffer, distributed RL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.