2018
DOI: 10.48550/arxiv.1803.06773
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Composable Deep Reinforcement Learning for Robotic Manipulation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Recent work in RL for manipulation has tended to take a more tabula rasa approach, focusing on learning policies that output joint torques directly or that output position (and velocity) references to an underlying PD controller. Direct torque control has been used to learn many physical and simulated tasks, including peg insertion, placing a coat hanger, hammering, screwing a bottle cap [6], door opening, pick and place tasks [5], and Lego stacking tasks [20]. Learning position and/or velocity references to a fixed PD joint controller has been used for tasks such as door opening, hammering, object placement [21], Lego stacking [7], and in-hand manipulation [1].…”
Section: Introductionmentioning
confidence: 99%
“…Recent work in RL for manipulation has tended to take a more tabula rasa approach, focusing on learning policies that output joint torques directly or that output position (and velocity) references to an underlying PD controller. Direct torque control has been used to learn many physical and simulated tasks, including peg insertion, placing a coat hanger, hammering, screwing a bottle cap [6], door opening, pick and place tasks [5], and Lego stacking tasks [20]. Learning position and/or velocity references to a fixed PD joint controller has been used for tasks such as door opening, hammering, object placement [21], Lego stacking [7], and in-hand manipulation [1].…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to such greedy procedure, maximum entropy objective considers entropy over the entire policy trajectories [13,25,29]. Though entropy regularization is simpler to implement in practice, [12,13] argues in favor of maximum entropy objective by showing that trained policies can be robust to noise, which is desirable for real life robotics tasks; and multi-modal, a potentially desired property for exploration and fine-tuning for downstream tasks. However, their training procedure is fairly complex, which consists of training a soft Q function by fixed point iteration and a neural sampler by Stein variational gradient [21].…”
Section: Related Workmentioning
confidence: 99%
“…Equation ( 15) takes similar form as Equation ( 11). Since we have already learned Q (s t , q 1,t , a t ) and Q (s t , q 2,t , a t ), and Q q1,t∧q2,t (s t , q t , a t ) is nonzero only when there are states s t where D q1,t φ1 ∧ D q2,t φ2 is true, we should obtain a good initialization of Q (s t , q t , a t ) by adding Q (s t , q 1,t , a) and Q (s t , q 2,t , a t ) (similar technique is adopted by Haarnoja et al [2018]). This addition of local Q functions is in fact an optimistic estimation of the global Q function, the properties of such Q-decomposition methods are studied by Russell and Zimdars [2003].…”
Section: Fsa Augmented Mdpmentioning
confidence: 99%
“…In stochastic optimal control, this idea has been adopted by Todorov [2009] and Da Silva et al [2009] to construct provably optimal control laws based on linearly solvable Markov decision processes. Haarnoja et al [2018] have showed in simulated and real manipulation tasks that approximately optimal policies can result from adding the Q-functions of the existing policies.…”
Section: Introductionmentioning
confidence: 99%