In a vehicular delay tolerant network (VDTN), there is no static connection, and the network behavior is highly temporal. This makes determining the routing protocol critical for the performance of a network. However, traditional routing technology rarely considers the influence of a VDTN's selfish nodes. Under ideal conditions, all nodes will try to store and transfer as many messages as possible. However, selfish nodes may not transfer messages to other nodes due to limited resources. Taking selfish nodes into account is necessary for improving the performance of a VDTN. In this paper, we calculate each node's credit value by recording each node's behavior when messages are transferring to avoid selecting selfish nodes as the relay. We present an efficient VDTN specific multi-copy routing algorithm that combines Q-learning and the credit value to determine whether a candidate node is suitable for delivering a message. Because multi-copy routing protocols require a high buffer to store messages, we also optimize the node buffer to reduce network congestion. The proposed algorithm is evaluated by a number of different performance metrics, such as delivery probability, network overhead, and message latency. The proposed algorithm achieves better results in different configurations and provides improved delivery probability, low message latency, and network overhead.