In multi-provider 5G/6G networks, service delegation enables administrative domains to federate in provisioning NFV network services. Admission control is fundamental in selecting the appropriate deployment domain to maximize average profit without prior knowledge of service requests' statistical distributions. This paper analyzes a general federation contract model for service delegation in various ways. First, under the assumption of known system dynamics, we obtain the theoretically optimal performance bound by formulating the admission control problem as an infinite-horizon Markov decision process (MDP) and solving it through dynamic programming. Second, we apply reinforcement learning to practically tackle the problem when the arrival and departure rates are not known. As Q-learning maximizes the discounted rewards, we prove it is not an efficient solution due to its sensitivity to the discount factor. Then, we propose the average reward reinforcement learning approach (R-Learning) to find the policy that directly maximizes the average profit. Finally, we evaluate different solutions through extensive simulations and experimentally using the 5Growth platform. Results confirm that the proposed R-Learning solution always outperforms Q-Learning and the greedy policies. Furthermore, while there is at most 9% optimality gap in the ideal simulation environment, it competes with the MDP solution in the experimental assessment.