“…The idea is to learn a policy at simulation time when there is a collective view of the system, and then at runtime use that policy but only with local observations. The typical approach in such cases is based on actor-critic systems [32], [33], [34], [35], where the actor is the distributed policy (with only local information) and the critic is a neural network that takes the overall system state. Mean-field RL [17] is one of such concrete applications of CTDE where the interactions among the population of agents are estimated by considering either the effect of a single agent and the average impact of the entire population or the influence of neighbouring agents.…”