2018 IEEE International Conference on Data Mining (ICDM) 2018
DOI: 10.1109/icdm.2018.00077
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
67
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 92 publications
(67 citation statements)
references
References 12 publications
0
67
0
Order By: Relevance
“…While these scale well, they are myopic and, as a result, do not consider the impact of a given assignment on future assignments. The third thread, consists of approaches that use Reinforcement Learning (RL) to address the myopia associated with approaches from the second category for the ToD problem (Xu et al 2018;Lin et al 2018;Li et al 2019;Wang et al 2018;Verma et al 2017). Past RL work for the ToD problem cannot be extended to solve the RMP, however, because it relies heavily on the assumption that vehicles can only serve one passenger at a time.…”
mentioning
confidence: 99%
“…While these scale well, they are myopic and, as a result, do not consider the impact of a given assignment on future assignments. The third thread, consists of approaches that use Reinforcement Learning (RL) to address the myopia associated with approaches from the second category for the ToD problem (Xu et al 2018;Lin et al 2018;Li et al 2019;Wang et al 2018;Verma et al 2017). Past RL work for the ToD problem cannot be extended to solve the RMP, however, because it relies heavily on the assumption that vehicles can only serve one passenger at a time.…”
mentioning
confidence: 99%
“…• MDP: Xu et al [39] implemented dispatching through a learning and planning approach: each vehicle-order pair is valued in consideration of both immediate rewards and future gains in the learning step, and dispatch is solved using a combinatorial optimizing algorithm in planning step. • DDQN: Wang et al [36] introduced a double-DQN with spatialtemporal action search. The network architecture is similar to the one described in DQN except that a selected action space is utilized and network parameters are updated via double-DQN.…”
Section: Compared Methodsmentioning
confidence: 99%
“…To balance the charging station utilization, we need to reduce the number of charging stations with very low and very high CSOR since a very low CSOR means resource waste and a very high CSOR means potential longer waiting time in those stations. Simulation Setup: We adopt a rolling horizon manner to conduct the simulation, which is widely utilized in the vehicle mobility intervention research, e.g., order dispatching of for-hire vehicles [45,49]. The basic idea of the rolling horizon manner is that we will update the status of all charging stations after scheduling a vehicle, and then the next decision is made based on the updated information.…”
Section: Methodsmentioning
confidence: 99%