Hybrid pointer networks for traveling salesman problems optimization

Stohy, Ahmed; Abdelhakam, Heba-Tullah; Ali, Sayed; Elhenawy, Mohammed; Hassan, Abdallah; Masoud, Mahmoud; Glaser, Sébastien; Rakotonirainy, Andry

doi:10.1371/journal.pone.0260995

Cited by 12 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we reformulate the PDP as a reinforcement learning (RL) problem, which is followed by the development of a model based on the encoder and decoder structure to learn node selection process for solution construction empowered by Hybrid pointer networks (HPN) [ 10 ].…”

Section: Methodsmentioning

confidence: 99%

“…In order to learn policy π , following several previous studies [ 10 ] where the HPN model was built upon the pointer networks, we built a policy network with an encoder-decoder structure. Given the features of PDP, the HPN was expected to learn the link between the nodes of various roles, allowing the precedence constraint to be captured intrinsically.…”

Section: Methodsmentioning

confidence: 99%

“…To address these challenges, in the present study, we propose using a deep reinforcement learning-based strategy to solve PDP that integrates with Hybrid Pointer Networks [ 10 ]. The DRL’s policy network includes an encoder-decoder structure and learns to design a solution by iteratively selecting a pickup or delivery point at each time step.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Solving pickup and drop-off problem using hybrid pointer networks with deep reinforcement learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

In this study, we propose a general method for tackling the Pickup and Drop-off Problem (PDP) using Hybrid Pointer Networks (HPNs) and Deep Reinforcement Learning (DRL). Our aim is to reduce the overall tour length traveled by an agent while remaining within the truck’s capacity restrictions and adhering to the node-to-node relationship. In such instances, the agent does not allow any drop-off points to be serviced if the truck is empty; conversely, if the vehicle is full, the agent does not allow any products to be picked up from pickup points. In our approach, this challenge is solved using machine learning-based models. Using HPNs as our primary model allows us to gain insight and tackle more complicated node interactions, which simplified our objective to obtaining state-of-art outcomes. Our experimental results demonstrate the effectiveness of the proposed neural network, as we achieve the state-of-art results for this problem as compared with the existing models. We deal with two types of demand patterns in a single type commodity problem. In the first pattern, all demands are assumed to sum up to zero (i.e., we have an equal number of backup and drop-off items). In the second pattern, we have an unequal number of backup and drop-off items, which is close to practical application, such as bike sharing system rebalancing. Our data, models, and code are publicly available at Solving Pickup and Dropoff Problem Using Hybrid Pointer Networks with Deep Reinforcement Learning.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Solving pickup and drop-off problem using hybrid pointer networks with deep reinforcement learning

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…A DQN could also be given a graph that describes the paths between cities in the U.S. and the cost of traveling between any two adjacent cities (i.e. the traveling salesman problem [7]). If a given graph has N cities, the computational complexity of this NP-complete problem is O(N 22N ).…”

Section: Introductionmentioning

confidence: 99%

Computer Science and Machine Learning Trends 2023

2023

View full text Add to dashboard Cite

Mnih's seminal deep reinforcement learning paper that applied a Deep Q-network to Atari video games demonstrated the importance of a replay buffer and a target network. Though the pair were required for convergence, the use of the replay buffer came at a significant computational cost. With each new sample generated by the system, the targets in the mini batch buffer were continually recomputed. We propose an alternative that eliminates the target recomputation called TAO-DQN (Target Accelerated Optimization-DQN). Our approach focuses on a new replay buffer algorithm that lowers the computational burden. We implemented this new approach on three experiments involving environments from the OpenAI gym. This resulted in convergence to better policies in fewer episodes and less time. Furthermore, we offer a mathematical justification for our improved convergence rate.

show abstract

“…Ma et al (2019) [15] used hierarchical RL for training in a graph pointer network for TSP. Stohy et al (2021) [16] used an actor-critic method for training in a hybrid pointer network. These algorithms use RL to train neural networks by testing a large number of small-scale TSP datasets, which consumes a great deal of time and resources, and the accuracy of the solution is not ideal.…”

Section: Introductionmentioning

confidence: 99%

Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

2023

View full text Add to dashboard Cite

In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale.

show abstract

Hybrid pointer networks for traveling salesman problems optimization

Cited by 12 publications

References 14 publications

Solving pickup and drop-off problem using hybrid pointer networks with deep reinforcement learning

Solving pickup and drop-off problem using hybrid pointer networks with deep reinforcement learning

Computer Science and Machine Learning Trends 2023

Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

Contact Info

Product

Resources

About