“…For each asset, the path discovery problem is formulated as a Reinforcement Learning (RL) task [18]. Considering an agent taking decisions in a statistical environment, RL is a methodology, widely used in the literature for routing problems (e.g., [12]) which computes the agent's optimal policy based on the observation of the environment after a decision is taken. In brief, the RL task is organized in stages t ∈ [1,2,…,T], where T can also be infinite, and requires the definition of a finite state space S, a finite set A(s) of actions a which can be chosen in state s, a reward function r which maps each state to a real number.…”