“…In RL, it may refer either to training the agent on successive tasks of increasing complexity, until the desired complexity is reached [17,27,57,59,61,65], or, more commonly, to supplementing the MDP's reward function with additional, artificial rewards [3,14,15,24,38,49,50,53,85]. This article employs shaping functions in the latter sense.…”