Delay tolerant networks (DTNs), are characterized by their difficulty in establishing end-to-end paths and and large message propagation delays. To control network overhead costs, reduce message delays, and improve delivery rates in DTNs, it is essential to not only delete messages that have reached their destination but also to more precisely determine appropriate relay nodes. Based on the above goals, this paper constructs a multi-copy relay node selection router algorithm based on Q-lambda reinforcement learning with reference to the idea of community division (QLCR). In community division, if a node has the highestdegree, it is considered the core node, and nodes with similar interests and structural properties are divided into a community. Node degree refers to the number of nodes associated with the node, indicating its importance in the network. Structural similarity determines the distance between nodes. The selection of relay nodes considers node degree, interests, and structural similarity. The Q-lambda reinforcement learning algorithm enables each node to learn from the entire network, setting corresponding reward values based on encountered nodes meeting the specified conditions. Through iterative processes, the node with the most cumulative reward value is chosen as the final relay node. Experimental results demonstrate that the proposed algorithm achieves a high delivery rate while maintaining low network overhead and delay.