Quantum key distribution (QKD) networks are potential to be widely deployed in the immediate future to provide long-term security for data communications. Given the high price and complexity, multi-tenancy has become a cost-effective pattern for QKD network operations. In this work, we concentrate on addressing the online multi-tenant provisioning (On-MTP) problem for QKD networks, where multiple tenant requests (TRs) arrive dynamically. On-MTP involves scheduling multiple TRs and assigning non-reusable secret keys derived from a QKD network to multiple TRs, where each TR can be regarded as a high-security-demand organization with the dedicated secret-key demand. The quantum key pools (QKPs) are constructed over QKD network infrastructure to improve management efficiency for secret keys. We model the secret-key resources for QKPs and the secret-key demands of TRs using distinct images. To realize efficient On-MTP, we perform a comparative study of heuristics and reinforcement learning (RL) based On-MTP solutions, where three heuristics (i.e., random, fit, and best-fit based On-MTP algorithms) are presented and a RL framework is introduced to realize automatic training of an On-MTP algorithm. The comparative results indicate that with sufficient training iterations the RL-based On-MTP algorithm significantly outperforms the presented heuristics in terms of tenant-request blocking probability and secret-key resource utilization.Index Terms-Quantum key distribution networks, online multi-tenant provisioning, heuristics, reinforcement learning.