We apply a single deep reinforcement learning agent for dynamic virtual network provisioning. Benchmarked against state of the art heuristics, our approach achieves an order of magnitude lower blocking probability. Interpretability analysis provides insight to the agent's use of spectrum resources.