In order to solve the problem of cross-regional customized bus (CB) route planning during the COVID-19, we develop a CB route planning method based on an improved Q-learning algorithm. First, we design a sub-regional route planning approach considering commuters' time windows of pick-up stops and drop-off stops. Second, for the CB route with the optimal social total travel cost, we improve the traditional Q-learning algorithm, including state-action pair, reward function and update rule of Q value table. Then, a setup method of CB stops is designed and the path impedance function is constructed to obtain the optimal operating path between each of the two stops. Finally, we take three CB lines in Beijing as examples for numerical experiment, the theoretical and numerical results show that (i) compared with the current situation, although the actual operating cost of optimized route increases slightly, it is covered by the reduction of travel cost of passengers and the transmission risk of COVID-19 has also dropped significantly; (ii) the improved Q-learning algorithm can solve the problem of data transmission lag effectively and reduce the social total travel cost obviously.INDEX TERMS Customized bus, route planning, reinforcement learning, Q-learning algorithm, time window.