“…Unsupervised exploration is an emergent and challenging topic for reinforcement learning (RL) that inspires research interests in both application [Riedmiller et al, 2018, Finn and Levine, 2017, Xie et al, 2018, Schaul et al, 2015, Riedmiller et al, 2018 and theory [Hazan et al, 2018, Jin et al, 2020, Zhang et al, 2020a, Zhang et al, 2020b, Wu et al, 2020, Wang et al, 2020b. The formal formulation of an unsupervised RL problem consists of an exploration phase and a planning phase [Jin et al, 2020]: in the exploration phase, an agent interacts with the unknown environment without the supervision of reward signals; then in the planning phase, the agent is prohibited to interact with the environment, and is required to compute a nearly optimal policy for some revealed reward function based on its exploration experiences. In particular, if the reward function is fixed yet unknown during exploration, the problem is called task-agnostic exploration (TAE) [Zhang et al, 2020a], and if the reward function is allowed to be chosen arbitrary, the problem is called reward-free exploration (RFE) [Jin et al, 2020].…”