Modern embedded and cyber-physical applications consist of critical and non-critical tasks co-located on multiprocessor systems on chip (MPSoCs). Co-location of tasks results in contention for shared resources, resulting in interference on interconnect, processing units, storage, etc. Hence, machine learning-based resource managers must operate even non-critical tasks within certain constraints to ensure proper execution of critical tasks. In this paper we demonstrate and evaluate countermeasures based on backup policies to enhance rule-based reinforcement learning to enforce constraints. Detailed experiments reveal the CPUs’ performance degradation caused by different designs, as well as their effectiveness in preventing constraint violations. Further, we exploit the interpretability of our approach to further improve the resource manager’s operation by adding designers’ experience into the rule set.