In 5G Multi-access Edge Computing (MEC) is critical to bring computing and processing closer to users and enable ultra-low latency communications. When instantiating an application, selecting the MEC host that minimizes the latency but still fulfills the application's requirements is critical. However, as future 6G networks are expected to become even more geo-distributed, and handled by multiple levels of management entities, this labor becomes extremely difficult and Machine Learning (ML) is meant to be a native part of this process. In this context, we propose a Reinforcement Learning model that selects the best possible host to instantiate a MEC application, looking to minimize the end-to-end delay while fulfilling the application requirements. The proposed ML method uses Deep Q-Learning through several stages of environment state, taking an action and rewarding the model when it chooses correctly and penalizing it otherwise. By modifying the reward incentives, we have successfully trained a model that chooses the best host possible delay-wise on a multi-level orchestration scenario, while meeting the applications' requirements. The results obtained via simulation over a series of MEC scenarios show a success rate of up to 96%, optimizing the delay in the long term.