“…DICE-based Methods DICE-based methods perform stationary distribution estimation, and many of them have been proposed for off-policy evaluation: DualDICE (Nachum et al, 2019a), GenDICE (Zhang et al, 2019a), GradientDICE (Zhang et al, 2020). Other lines of works consider reinforcement learning: AlgaeDICE (Nachum et al, 2019b), OptiDICE (Lee et al, 2021), CoptiDICE (Lee et al, 2022), f -DVL (Sikchi et al, 2023); offline policy selection: ; offline imitation learning: ValueDICE (Kostrikov et al, 2020), OPOLO (Zhu et al, 2020), IQlearn (Garg et al, 2021), DemoDICE (Kim et al, 2021), SmoDICE ; reward learning: RGM All of these DICE methods are either using true-gradient update or semi-gradient update, while our paper provides a new update rule: orthogonal-gradient update.…”