Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching

Wang, Zhaodong; Qin, Zhiwei; Tang, Xiaocheng; Ye, Jieping; Zhu, Hongtu

doi:10.1109/icdm.2018.00077

Cited by 92 publications

(67 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While these scale well, they are myopic and, as a result, do not consider the impact of a given assignment on future assignments. The third thread, consists of approaches that use Reinforcement Learning (RL) to address the myopia associated with approaches from the second category for the ToD problem (Xu et al 2018;Lin et al 2018;Li et al 2019;Wang et al 2018;Verma et al 2017). Past RL work for the ToD problem cannot be extended to solve the RMP, however, because it relies heavily on the assumption that vehicles can only serve one passenger at a time.…”

mentioning

confidence: 99%

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

Shah

Lowalekar

Varakantham

2020

AAAI

View full text Add to dashboard Cite

On-demand ride-pooling (e.g., UberPool, LyftLine, GrabShare) has recently become popular because of its ability to lower costs for passengers while simultaneously increasing revenue for drivers and aggregation companies (e.g., Uber). Unlike in Taxi on Demand (ToD) services – where a vehicle is assigned one passenger at a time – in on-demand ride-pooling, each vehicle must simultaneously serve multiple passengers with heterogeneous origin and destination pairs without violating any quality constraints. To ensure near real-time response, existing solutions to the real-time ride-pooling problem are myopic in that they optimise the objective (e.g., maximise the number of passengers served) for the current time step without considering the effect such an assignment could have on assignments in future time steps. However, considering the future effects of an assignment that also has to consider what combinations of passenger requests can be assigned to vehicles adds a layer of combinatorial complexity to the already challenging problem of considering future effects in the ToD case.A popular approach that addresses the limitations of myopic assignments in ToD problems is Approximate Dynamic Programming (ADP). Existing ADP methods for ToD can only handle Linear Program (LP) based assignments, however, as the value update relies on dual values from the LP. The assignment problem in ride pooling requires an Integer Linear Program (ILP) that has bad LP relaxations. Therefore, our key technical contribution is in providing a general ADP method that can learn from the ILP based assignment found in ride-pooling. Additionally, we handle the extra combinatorial complexity from combinations of passenger requests by using a Neural Network based approximate value function and show a connection to Deep Reinforcement Learning that allows us to learn this value-function with increased stability and sample-efficiency. We show that our approach easily outperforms leading approaches for on-demand ride-pooling on a real-world dataset by up to 16%, a significant improvement in city-scale transportation problems.

show abstract

mentioning

confidence: 99%

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

Shah

Lowalekar

Varakantham

2020

AAAI

View full text Add to dashboard Cite

show abstract

“…• MDP: Xu et al [39] implemented dispatching through a learning and planning approach: each vehicle-order pair is valued in consideration of both immediate rewards and future gains in the learning step, and dispatch is solved using a combinatorial optimizing algorithm in planning step. • DDQN: Wang et al [36] introduced a double-DQN with spatialtemporal action search. The network architecture is similar to the one described in DQN except that a selected action space is utilized and network parameters are updated via double-DQN.…”

Section: Compared Methodsmentioning

confidence: 99%

CoRide

Jin

Zhou

Zhang

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

How to optimally dispatch orders to vehicles and how to trade off between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decisionmaking task of order dispatching and fleet management in online ride-hailing platforms. This task brings unique challenges in the following four aspects. First, to facilitate a huge number of vehicles to act and learn efficiently and robustly, we treat each region cell as an agent and build a multi-agent reinforcement learning framework. Second, to coordinate the agents from different regions to achieve long-term benefits, we leverage the geographical hierarchy of the region grids to perform hierarchical reinforcement learning. Third, to deal with the heterogeneous and variant action space for joint order dispatching and fleet management, we design the action as the ranking weight vector to rank and select the specific order or the fleet management destination in a unified formulation. Fourth, to achieve the multi-scale ride-hailing platform, we conduct the decision-making process in a hierarchical way where a multihead attention mechanism is utilized to incorporate the impacts of neighbor agents and capture the key agent in each scale. The whole novel framework is named as CoRide. Extensive experiments based on multiple cities real-world data as well as analytic synthetic data demonstrate that CoRide provides superior performance in terms of platform revenue and user experience in the task of citywide hybrid order dispatching and fleet management over strong baselines.

show abstract

“…To balance the charging station utilization, we need to reduce the number of charging stations with very low and very high CSOR since a very low CSOR means resource waste and a very high CSOR means potential longer waiting time in those stations. Simulation Setup: We adopt a rolling horizon manner to conduct the simulation, which is widely utilized in the vehicle mobility intervention research, e.g., order dispatching of for-hire vehicles [45,49]. The basic idea of the rolling horizon manner is that we will update the status of all charging stations after scheduling a vehicle, and then the next decision is made based on the updated information.…”

Section: Methodsmentioning

confidence: 99%

sharedCharging

Wang

Zhang

et al. 2019

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

Our society is witnessing a rapid vehicle electrification process. Even though being environmental-friendly, electric vehicles have not reached their full potentials due to prolonged charging time. Moreover, unbalanced spatiotemporal charging demand/supply along with the uneven number of charging stations between heterogeneous fleets make electric vehicle management more challenging, e.g., surplus charging stations across a city for electric buses but limited charging stations in some regions for electric taxis, which severely limit the charging performance of the whole electric vehicle network in a city. In this paper, we first analyze a large-scale real-world dataset from two heterogeneous electric vehicle fleets in the Chinese city Shenzhen. We investigate their mobility and charging patterns and then verify the practicability and necessity of shared charging. Based on the insights we found, we design a generic real-time shared charging scheduling system called sharedCharging to improve overall charging efficiency for heterogeneous electric vehicle fleets. Our sharedCharging also considers sophisticated real-world constraints, e.g., station spaces, availability of charging points, real-time timetable guarantee, etc. More importantly, we take the electric bus and electric taxi fleets as a concrete example of heterogeneous electric vehicle fleets given their different operating patterns. We implement and evaluate sharedCharging with streaming data from over 13,000 electric taxis and 16,000 electric buses, coupled with the charging station data in the Chinese city Shenzhen, which is the largest public electric vehicle network in the world. The evaluation results demonstrate that the proposed sharedCharging reduces the waiting time by 63.5% and reduces the total charging time by 15% on average for e-taxis.

show abstract

Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching

Cited by 92 publications

References 12 publications

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

CoRide

sharedCharging

Contact Info

Product

Resources

About