“…We compare NMER to the following baselines: (i) Uniform, Vanilla Replay (U) (Mnih et al 2013;Engel, Mannor, and Meir 2005), where transitions are sampled i.i.d. uniformly from the replay buffer, (ii) Prioritized Experience Replay (PER) (Schaul et al 2016) with stochastic prioritization, (iii) Continuous Transition (CT) (Lin et al 2020). Since the main comparison between NMER and CT is how samples are selected for interpolation, we make two modifications to the original CT baseline: (a) We remove the automatic Mixup α hyperparameter tuning mechanism, and (b) If a terminal state is encountered in either the sample or neighbor transition, no interpolation occurs, and the sampled transition is simply used for training the agent.…”