Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Lin, Junfan; Zhongzhan, Huang,; Wang, Keze; Liang, Xiaodan; Chen, Weiwei; Li, Lin

doi:10.48550/arxiv.2011.14487

Cited by 1 publication

(14 citation statements)

References 50 publications

(72 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several reinforcement learning methodologies make use of Mixup-interpolated experiences for training reinforcement learning agents. In Continuous Transition (Lin et al 2020), temporally-adjacent transitions are interpolated with Mixup, generating synthetic transitions between pairs of consecutive transitions. In MixReg (Wang et al 2020), generated transitions are formed using Mixup on combinations of input and output signals.…”

Section: Related Workmentioning

confidence: 99%

“…While these approaches increase the training domain via interpolation, they do not strictly enforce geometric transition proximity of the resulting samples. Proximity between the points used for sampling is encoded temporally, as in (Lin et al 2020;Sinha, Mandlekar, and Garg 2021), but not in the geometric transition space of the agent's experience. NMER employs a nearest neighbor heuristic to encourage transition pairs for Mixup to be located approximately within the same dynamics regimes in the transition manifold.…”

Section: Related Workmentioning

confidence: 99%

“…NMER employs a nearest neighbor heuristic to encourage transition pairs for Mixup to be located approximately within the same dynamics regimes in the transition manifold. Compared to Continuous Transition (Lin et al 2020) and S4RL (Sinha, Mandlekar, and Garg 2021), samples interpolated with NMER may better preserve the local dynamics of the environment and enable further agent regularization through inter-episode interpolation between transitions and their dynamic sets of nearest neighbors.…”

Section: Related Workmentioning

confidence: 99%

“…Implementation details and ablation studies for each replay buffer variant are provided in the technical report. We measure replay buffer sample efficiency using the evaluation reward of the reinforcement learning agent after 200K environment interactions have been sampled, as in (Lin et al 2020;K. Lee et al 2020).…”

Section: Continuous Control Evaluationmentioning

confidence: 99%

“…We compare NMER to the following baselines: (i) Uniform, Vanilla Replay (U) (Mnih et al 2013;Engel, Mannor, and Meir 2005), where transitions are sampled i.i.d. uniformly from the replay buffer, (ii) Prioritized Experience Replay (PER) (Schaul et al 2016) with stochastic prioritization, (iii) Continuous Transition (CT) (Lin et al 2020). Since the main comparison between NMER and CT is how samples are selected for interpolation, we make two modifications to the original CT baseline: (a) We remove the automatic Mixup α hyperparameter tuning mechanism, and (b) If a terminal state is encountered in either the sample or neighbor transition, no interpolation occurs, and the sampled transition is simply used for training the agent.…”

Section: Continuous Control Evaluationmentioning

confidence: 99%

See 4 more Smart Citations

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

Sander¹,

Schwarting²,

Seyde³

et al. 2022

Preprint

View full text Add to dashboard Cite

Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al. 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state-action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Continuous Control Evaluationmentioning

confidence: 99%