The deployment of beyond 5G and 6G network infrastructures will enable highly dynamic services requiring stringent Quality of Service (QoS). Supporting such combinations in today's transport networks will require high flexibility and automation to operate near real-time and reduce overprovisioning. Many solutions for autonomous network operation based on Machine Learning require a global network view, and thus need to be deployed at the Software-Defined Networking (SDN) controller. In consequence, these solutions require implementing control loops, where algorithms running in the controller use telemetry measurements collected at the data plane to make decisions that need to be applied at the data plane. Such control loops fit well for provisioning and failure management purposes, but not for near real-time operation because of their long response times. In this paper, we propose a distributed approach for autonomous near-real-time flow routing with QoS assurance. Our solution brings intelligence closer to the data plane to reduce response times; it is based on the combined application of Deep Reinforcement Learning (DRL) and Multi-Agent Systems (MAS) to create a distributed collaborative network control plane. Node agents ensure QoS of traffic flows, specifically end-to-end delay, while minimizing routing costs by making distributed routing decisions. Algorithms in the centralized network controller provide the agents with the set of routes that can be used for each traffic flow and give freedom to the agents to use them during operation. Results show that the proposed solution is able to ensure end-toend delay under the desired maximum and greatly reduce routing costs. This performance is achieved in dynamic scenarios without previous knowledge of the traffic profile or the background traffic, for single domain and multidomain networks.