As a research hotspot in the field of artificial intelligence, the application of deep reinforcement learning to the learning of the motion ability of a manipulator can help to improve the learning of the motion ability of a manipulator without a kinematic model. To suppress the overestimation bias of values in Deep Deterministic Policy Gradient (DDPG) networks, the Twin Delayed Deep Deterministic Policy Gradient (TD3) was proposed. This paper further suppresses the overestimation bias of values for multi-degree of freedom (DOF) manipulator learning based on deep reinforcement learning. Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree freedom manipulators is in place,with an improved learning ability by 29.15% on the basis of TD3. In this paper, a step-by-step reward function is proposed specifically for the learning and innovation of the multi degree of freedom manipulator’s motion ability. The view of continuous decision-making and process problem is used to guide the learning of the manipulator, and the learning efficiency is improved by optimizing the playback of experience. In order to measure the point-to-point position motion ability of a manipulator, a new evaluation index based on the characteristics of the continuous decision process problem, energy efficiency distance, is presented in this paper, which can evaluate the learning quality of the manipulator motion ability by a more comprehensive and fair evaluation algorithm.
Wireless sensor network (WSN) is formed by a large number of cheap sensors, which are communicated by an ad hoc wireless network to collect information of sensed objects of a certain area. The acquired information is useful only when the locations of sensors and objects are known. Therefore, localization is one of the most important technologies of WSN. In this paper, weighted Voronoi diagram-based localization scheme (W-VBLS) is proposed to extend Voronoi diagram-based localization scheme (VBLS). In this scheme, firstly, a node estimates the distances according to the strength of its received signal strength indicator (RSSI) from neighbor beacons and divides three beacons into groups, whose distances are similar. Secondly, by a triangle, formed by the node and two beacons of a group, a weighted bisector can be calculated out. Thirdly, an estimated position of the node with the biggest RSSI value as weight can be calculated out by three bisectors of the same group. Finally, the position of the node is calculated out by the weighted average of all estimated positions. The simulation shows that compared with centroid and VBLS, W-VBLS has higher positioning accuracy and lower computation complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.