“…As a result, many heuristic RMSA schemes are proposed [3]. Recently, deep reinforcement learning (DRL) based approaches rise and show advantage over the conventional heuristic methods, thanks to the development of machine learning [4][5][6][7]. Under the DRL framework, the agent observes the EON state, emits a RMSA action and receives a reward from the EON which is to reflect the goodness of the action.…”