“…After that, based on the Bellman Equation ( Jones and Peet, 2021 ), a new state-value function is developed to change the single-step delivery reward into a multi-step return which represents as the matching degree between the delivery task and the SAEV. The Back Propagation-Deep Neural Network (BP-DNN) algorithm ( Zheng et al, 2022 ) is adopted to estimate the state-value function based on historical trip episode samples. Finally, 4 simulation test cases are designed to verify the operational performance of the above methodology.…”