“…The policy learning happens at two levels: each option policy is learned individually on the low level and the high-level controller learns which option to select in which state. Recently, there have been several works on defining symbolic options, allowing the RL agent to use reasoning instead of learning for finding (partially-ordered) plans over the set of options (Illanes, Yan, Icarte, & McIlraith, 2020;Lee, Katz, Agravante, Liu, Klinger, Campbell, Sohrabi, & Tesauro, 2021;Jin, Ma, Jin, Zhuo, Chen, & Yu, 2022). These approaches are very similar in spirit to policy sketches and future research could even define options based on sketch rules.…”