2020
DOI: 10.48550/arxiv.2011.14487
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Abstract: Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policyirrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(14 citation statements)
references
References 50 publications
(72 reference statements)
0
14
0
Order By: Relevance
“…Several reinforcement learning methodologies make use of Mixup-interpolated experiences for training reinforcement learning agents. In Continuous Transition (Lin et al 2020), temporally-adjacent transitions are interpolated with Mixup, generating synthetic transitions between pairs of consecutive transitions. In MixReg (Wang et al 2020), generated transitions are formed using Mixup on combinations of input and output signals.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Several reinforcement learning methodologies make use of Mixup-interpolated experiences for training reinforcement learning agents. In Continuous Transition (Lin et al 2020), temporally-adjacent transitions are interpolated with Mixup, generating synthetic transitions between pairs of consecutive transitions. In MixReg (Wang et al 2020), generated transitions are formed using Mixup on combinations of input and output signals.…”
Section: Related Workmentioning
confidence: 99%
“…While these approaches increase the training domain via interpolation, they do not strictly enforce geometric transition proximity of the resulting samples. Proximity between the points used for sampling is encoded temporally, as in (Lin et al 2020;Sinha, Mandlekar, and Garg 2021), but not in the geometric transition space of the agent's experience. NMER employs a nearest neighbor heuristic to encourage transition pairs for Mixup to be located approximately within the same dynamics regimes in the transition manifold.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations