UAV Autonomous Target Search Based on Deep Reinforcement Learning in Complex Disaster Scene

Wu, Chunxue; Ju, Bobo; Wu, Yan; Lin, Xiao; Xiong, Naixue; Xu, Guangquan; Li, Hongyan; Liang, Xuefeng

doi:10.1109/access.2019.2933002

Cited by 108 publications

(59 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…P π(s) s,st+1 denotes the transition probability unknown in reality, and π(s) is the action generated under a specific policy. Based on (17) and (18), the cost functions (11), (12) can be rewritten as:…”

Section: A Bellman Equationmentioning

confidence: 99%

“…In order to achieve the ability to learn automatically, we design the updating steps as following [35]: 1) evaluating result: obtain V i,π (s) and V total,π (q) according to (17) and (18) based on the policy π for all status.…”

Section: A Bellman Equationmentioning

confidence: 99%

“…Bellman equations formulate the optimal conditions for our problem but the transition probability is actually unknown in practice. Therefore, we can't compute (17) and (18) directly. In order to obtain an acceptable result, RL algorithms are taken into consideration.…”

Section: A Bellman Equationmentioning

confidence: 99%

“…The authors in [17] investigate the computation offloading problem in blockchain empowered mobile edge computing system, where the deep RL algorithm is applied to the computing offloading decision-making process. Moreover, deep RL is also applied in unmanned aerial vehicle autonomous target searching in a complex disaster scene [18], where the superior ability on dynamic programming of deep RL can be observed. In addition, some well-known deep RL algorithms such as SARSA [19], DQN [20]- [22] are investigated and exploited in practical communication systems.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Energy Minimization in D2D-Assisted Cache-Enabled Internet of Things: A Deep Reinforcement Learning Approach

Tang

Zhang

et al. 2020

IEEE Trans. Ind. Inf.

View full text Add to dashboard Cite

Mobile edge caching (MEC) and device to device (D2D) communications are two potential technologies to resolve traffic overload problems in internet of things (IoT). Previous works usually investigate them separately with MEC for traffic offloading and D2D for information transmission. In this paper, a joint framework consisting of MEC and cache-enabled D2D communications is proposed to minimize the energy cost of systematic traffic transmission, where file popularity and user preference are the critical criteria for small base stations (SBSs) and user devices, respectively. Under this framework, we propose a novel caching strategy where Markov decision process (MDP) is applied to model the requesting behaviours. A novel scheme based on reinforcement learning (RL) is proposed to reveal the popularity of files as well as users' preference. In particular, Q-learning (QL) algorithm and deep Q-network (DQN) algorithm are respectively applied to user devices and SBS due to different complexities of status. To save the energy cost of systematic traffic transmission, users acquire partial traffic through D2D communications based on the cached contents and user distribution. Taking the memory limits, D2D available files and status changing into consideration, the proposed RL algorithm enables user devices and SBS to prefetch the optimal files while learning, which can reduce the energy cost significantly. Simulation results demonstrate the superior energy saving performance of the proposed RL-based algorithm over other existing methods under various conditions.

show abstract

Section: A Bellman Equationmentioning

confidence: 99%

Section: A Bellman Equationmentioning

confidence: 99%

Section: A Bellman Equationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Energy Minimization in D2D-Assisted Cache-Enabled Internet of Things: A Deep Reinforcement Learning Approach

Tang

Zhang

et al. 2020

IEEE Trans. Ind. Inf.

View full text Add to dashboard Cite

show abstract

“…The ant colony algorithm [38][39][40][41][42][43] is used to solve the order of shipments in each cluster and to dispatch the shortest path.…”

Section: Ant Colony Algorithm For Tsp Problemmentioning

confidence: 99%

Design and Analysis of the Task Distribution Scheme of Express Center at the End of Modern Logistics

et al. 2019

Electronics

View full text Add to dashboard Cite

With the rise and improvement of artificial intelligence technology, the express delivery industry has become more intelligent. At the terminal of modern logistics, each dispatch center has hundreds of express mail deliveries to be dispatched every day, and the number of dispatchers is far less than the number of express mail deliveries. How to assign scientific tasks to each courier dispatch is the main target of this paper. The purpose is to make the number of tasks between the various couriers in the express center roughly the same in each cycle, so that there is a more balanced income between the couriers. In the simulation experiment, the delivery addresses are clustered according to the balanced k-means algorithm. Then, the ant colony algorithm is used to plan the delivery order of the express items in each class. Then, the time cost model is established according to the delivery distance of the express items in each class and the delivery mode of the express items to calculate the delivery time cost. Through a large amount of experimental data, the standard deviation of delivery time cost of each courier gradually decreases and tends to stabilize, which suggests that this method has a good effect on the dispatching task assignment of the express center. It can effectively make the delivery workload between the distributors roughly the same, and improve the delivery efficiency of the courier, save energy, and promote sustainable development.

show abstract

Birds Classification Based on Deep Transfer Learning

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

UAV Autonomous Target Search Based on Deep Reinforcement Learning in Complex Disaster Scene

Cited by 108 publications

References 23 publications

Energy Minimization in D2D-Assisted Cache-Enabled Internet of Things: A Deep Reinforcement Learning Approach

Energy Minimization in D2D-Assisted Cache-Enabled Internet of Things: A Deep Reinforcement Learning Approach

Design and Analysis of the Task Distribution Scheme of Express Center at the End of Modern Logistics

Birds Classification Based on Deep Transfer Learning

Contact Info

Product

Resources

About