Deep reinforcement learning (RL) holds considerable promise to help address a variety of multi-agent problems in a dynamic and complex environment. In multi-agent scenarios, most tasks require multiple agents to cooperate and the number of agents has a negative impact on the training efficiency of reinforcement learning. To this end, we propose a novel method, which uses the framework of centralized training and distributed execution and uses parameter sharing among homogeneous agents to replace partial calculation of network parameters in policy evolution. The parameter asynchronous sharing mechanism and the soft sharing mechanism are used to balance the exploratory of agents and the consistency of homogenous agents' policy. We experimentally validated our approach in different types of multi-agent scenarios. The empirical results show that our method can significantly promote training efficiency in collaborative tasks, competitive tasks, and mixed tasks without affecting the performance. INDEX TERMSMulti-agent, reinforcement learning, neural network, parameter sharing, MADDPG, training efficiency.
In this paper, we explore a scalable deep reinforcement learning (DRL) method for environments with multi-agents. Due to the explosive increase of the input dimensionality with the number of agents, most existing DRL methods are only able to cope with single-agent settings, or for only a small number of agents. To address this problem, we adopt a centralized training with decentralized execution framework, in which an observation embedding is used to reduce the curse of dimensionality. Perturbations are injected into the parameter space of the actor network for each agent to encourage exploration. The experiments demonstrate the effectiveness of the proposed approach on both cooperative and competitive environments. INDEX TERMS Artificial intelligence, multi-agent, deep reinforcement learning, deep deterministic policy gradient, actor-critic, centralized training with decentralized execution, observation embedding, parameter noise.
In this paper, our goal is to improve the recognition accuracy of battlefield target aggregation behavior while maintaining the low computational cost of spatio-temporal depth neural networks. To this end, we propose a novel 3D-CNN (3D Convolutional Neural Networks) model, which extends the idea of multi-scale feature fusion to the spatio-temporal domain, and enhances the feature extraction ability of the network by combining feature maps of different convolutional layers. In order to reduce the computational complexity of the network, we further improved the multi-fiber network, and finally established an architecture—3D convolution Two-Stream model based on multi-scale feature fusion. Extensive experimental results on the simulation data show that our network significantly boosts the efficiency of existing convolutional neural networks in the aggregation behavior recognition, achieving the most advanced performance on the dataset constructed in this paper.
Interconnection between multiple data link systems is an urgent problem for wireless control systems. Its difficulty lies in the fact that data link messages are multi-source heterogeneous, By analyzing its sub-domain characteristics, we constructs the data message domain ontology and establishes a data link message ontology model based on Bayesian network(DLMOBN). It includes the study of nodes, directed edges and node similarity probability distribution and so on, convert multi-source heterogeneous messages into mathematical models. We propose a data link message ontology mapping algorithm, the OWL syntax is used to formally describe the acquired domain ontology, extract useful information such as concepts, attributes, and instances, and then store the information in a preset data structure, k-means algorithm is used to cluster them to form ''cluster'', which is used as a classification index to classify the similarity pair as a node in the Bayesian network, and pass the concept of the lower layer between nodes to the prior concept of the upper layer. The semantic distance, the attribute, the feature and other factors of the similar pair are used to calculate the semantic similarity. Finally, the final semantic similarity value is obtained by weighting. It is verified by experiments that the method improves the recall rate and precision, reduces the time complexity.
Using expert samples to improve the performance of reinforcement learning (RL) algorithms has become one of the focuses of research nowadays. However, in different application scenarios, it is hard to guarantee both the quantity and quality of expert samples, which prohibits the practical application and performance of such algorithms. In this paper, a novel RL decision optimization method is proposed. The proposed method is capable of reducing the dependence on expert samples via incorporating the decision-making evaluation mechanism. By introducing supervised learning (SL), our method optimizes the decision making of the RL algorithm by using demonstrations or expert samples. Experiments are conducted in Pendulum and Puckworld scenarios to test the proposed method, and we use representative algorithms such as deep Q-network (DQN) and Double DQN (DDQN) as benchmarks. The results demonstrate that the method adopted in this paper can effectively improve the decision-making performance of agents even when the expert samples are not available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.