Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning

Li, Minne; Qin, Zhiwei; Jiao, Yan; Yang, Yaodong; Wang, Jun; Wang, Chenxi; Wu, Guobin; Ye, Jieping

doi:10.1145/3308558.3313433

Cited by 200 publications

(120 citation statements)

References 31 publications

Supporting

Mentioning

120

Contrasting

Order By: Relevance

“…For all learning methods, following [13], we run 20 episodes for training, store the trained model periodically, and conduct the evaluation on the stored model with 5 random seeds. We compare the performance of different models regarding two criteria, including ADI, computed as the total income in a day, and ORR, calculated by the number of orders taken divided by the number of orders generated.…”

Section: Results Analysismentioning

confidence: 99%

“…It only assigns idle vehicles with available orders randomly at each timestep. • DQN: Li et al [13] conducted action-value function approximation based on Q-network. The Q-network is parameterized by a MLP with four hidden layers and we adopt the ReLU activation between hidden layers and to transform the final linear output of Q-network.…”

Section: Compared Methodsmentioning

confidence: 99%

See 1 more Smart Citation

CoRide

Jin

Zhou

Zhang

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

How to optimally dispatch orders to vehicles and how to trade off between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decisionmaking task of order dispatching and fleet management in online ride-hailing platforms. This task brings unique challenges in the following four aspects. First, to facilitate a huge number of vehicles to act and learn efficiently and robustly, we treat each region cell as an agent and build a multi-agent reinforcement learning framework. Second, to coordinate the agents from different regions to achieve long-term benefits, we leverage the geographical hierarchy of the region grids to perform hierarchical reinforcement learning. Third, to deal with the heterogeneous and variant action space for joint order dispatching and fleet management, we design the action as the ranking weight vector to rank and select the specific order or the fleet management destination in a unified formulation. Fourth, to achieve the multi-scale ride-hailing platform, we conduct the decision-making process in a hierarchical way where a multihead attention mechanism is utilized to incorporate the impacts of neighbor agents and capture the key agent in each scale. The whole novel framework is named as CoRide. Extensive experiments based on multiple cities real-world data as well as analytic synthetic data demonstrate that CoRide provides superior performance in terms of platform revenue and user experience in the task of citywide hybrid order dispatching and fleet management over strong baselines.

show abstract

Section: Results Analysismentioning

confidence: 99%

Section: Compared Methodsmentioning

confidence: 99%

CoRide

Jin

Zhou

Zhang

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

show abstract

“…RL is intended to capture the interactions between a large volume of vehicles in an adaptive manner. However, due to the curse of dimensionality, in practice, it is used in conjunction with an approximation technique, which often degrades the performance of this approach in large-scale fleet management [24,25]. RL methods also require a substantial amount of data to learn an efficient dispatch policy by capturing how to utilize various factors in a given transportation system [18,[26][27][28].…”

Section: Introductionmentioning

confidence: 99%

“…For example, the allocation of the ride request can be modeled by a combinatorial optimization problem and the acceptance of the allocated request can be modeled by the probability distribution [13]. Moreover, an independent and cooperative ride request allocating algorithm was also proposed [24]. In [38], an RL-based algorithm that can allocate the large-scale ride requests in real-time was proposed.…”

Section: Introductionmentioning

confidence: 99%

Multi-Objective Predictive Taxi Dispatch via Network Flow Optimization

Kim

Huh

et al. 2020

IEEE Access

View full text Add to dashboard Cite

In this paper, we discuss a large-scale fleet management problem in a multi-objective setting. We aim to seek a receding horizon taxi dispatch solution that serves as many ride requests as possible while minimizing the cost of relocating vehicles. To obtain the desired solution, we first convert the multi-objective taxi dispatch problem into a network flow problem, which can be solved using the classical minimum cost maximum flow (MCMF) algorithm. We show that a solution obtained using the MCMF algorithm is integer-valued; thus, it does not require any additional rounding procedure that may introduce undesirable numerical errors. Furthermore, we prove the time-greedy property of the proposed solution, which justifies the use of receding horizon optimization. For computational efficiency, we propose a linear programming method to obtain an optimal solution in near real time. The results of our simulation studies using the real-world data for the metropolitan area of Seoul, South Korea indicate that the performance of the proposed predictive method is almost as good as that of the oracle that foresees the future.

show abstract

“…However, for centralized approaches, a critical issue is the potential "single point of failure" [18], i.e., the failure of the centralized authority control will fail the whole system [16]. Another two related work using multi-agent to learn order-dispatching is based on mean-eld MARL [13] and knowledge transferring [35].…”

Section: Introductionmentioning

confidence: 99%

Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching

Zhou

Jin

Zhang

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

Improving the e ciency of dispatching orders to vehicles is a research hotspot in online ride-hailing systems. Most of the existing solutions for order-dispatching are centralized controlling, which require to consider all possible matches between available orders and vehicles. For large-scale ride-sharing platforms, there are thousands of vehicles and orders to be matched at every second which is of very high computational cost. In this paper, we propose a decentralized execution order-dispatching method based on multi-agent reinforcement learning to address the large-scale order-dispatching problem. Di erent from the previous cooperative multi-agent reinforcement learning algorithms, in our method, all agents work independently with the guidance from an evaluation of the joint policy since there is no need for communication or explicit cooperation between agents. Furthermore, we use KL-divergence optimization at each time step to speed up the learning process and to balance the vehicles (supply) and orders (demand). Experiments on both the explanatory environment and real-world simulator show that the proposed method outperforms the baselines in terms of accumulated driver income (ADI) and Order Response Rate (ORR) in various tra c environments. Besides, with the support of the online platform of Didi Chuxing, we designed a hybrid system to deploy our model. CCS CONCEPTS•Computing methodologies → Multi-agent reinforcement learning; • eory of computation → Multi-agent reinforcement learning; •Applied computing → Transportation;

show abstract

Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning

Cited by 200 publications

References 31 publications

CoRide

CoRide

Multi-Objective Predictive Taxi Dispatch via Network Flow Optimization

Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching

Contact Info

Product

Resources

About