In next-generation wireless networks, relay-based packet forwarding, emerged as an appealing technique to extend network coverage while maintaining the required service quality. The incorporation of multiple frequency bands, ranging from MHz/GHz to THz frequencies, and their opportunistic and/or simultaneous exploitation by relay nodes can significantly improve system capacity, however at the risk of increased packet latency. Since a relay node can use different bands to send and receive packets, there is a pressing need to design an efficient channel allocation algorithm without a central oracle. While existing greedy heuristics and game-theoretic techniques, which were developed for multi-band channel assignment to relay nodes, achieve minimum packet latency, their performance drops significantly when network dynamism (i.e., user mobility, non-quasi-static channel conditions) is introduced. Since this problem involves multiple relay nodes, we model it as a Markov Decision Process (MDP) involving various stages, which essentially means that achieving an optimal and stable solution is a computationally hard problem. Since solving the MDP, traditionally, consumes a great deal of time and is intractable for relay nodes, we explore how to approximate the optimal solution in a distributed manner by reformulating a reinforcement learning-based, smart channel adaptation problem in the considered multi-band relay network. By customizing a Q-Learning algorithm that adopts an epsilon-greedy policy, we can solve this re-formulated reinforcement learning problem. Extensive computer-based simulation results demonstrate that the proposed reinforcement learning algorithm outperforms the existing methods in terms of transmission time, buffer overflow, and effective throughput. We also provide the convergence analysis of the proposed model by systematically finding and setting the appropriate parameters.