Die-stacking technology is expanding the space diversity of on-chip communications by leveraging through-silicon-via (TSV) integration and wafer bonding. The 3D network-on-chip (NoC), a combination of die-stacking technology and systematic on-chip communication infrastructure, suffers from increased thermal density and unbalanced heat dissipation across multi-stacked layers, significantly affecting chip performance and reliability. Recent studies have focused on runtime thermal management (RTM) techniques for improving the heat distribution balance, but performance degradations, owing to RTM mechanisms and unbalanced inter-layer traffic distributions, remain unresolved. In this study, we present a Q-function-based traffic- and thermal-aware adaptive routing algorithm, utilizing a reinforcement machine learning technique that gradually incorporates updated information into an RTM-based 3D NoC routing path. The proposed algorithm initially collects deadlock-free directions, based on the RTM and topology information. Subsequently, Q-learning-based decision making (through the learning of regional traffic information) is deployed for performance improvement with more balanced inter-layer traffic. The simulation results show that the proposed routing algorithm can improve throughput by 14.0%–28.2%, with a 24.9% more balanced inter-layer traffic load and a 30.6% more distributed inter-layer thermal dissipation on average, compared with those obtained in previous studies of a 3D NoC with an 8 × 8 × 4 mesh topology.