2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017
DOI: 10.1109/iros.2017.8206046
|View full text |Cite
|
Sign up to set email alerts
|

Deep predictive policy training using reinforcement learning

Abstract: Skilled robot task learning is best implemented by predictive action policies due to the inherent latency of sensorimotor processes. However, training such predictive policies is challenging as it involves finding a trajectory of motor activations for the full duration of the action. We propose a data-efficient deep predictive policy training (DPPT) framework with a deep neural network policy architecture which maps an image observation to a sequence of motor activations. The architecture consists of three sub… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
105
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 109 publications
(105 citation statements)
references
References 31 publications
0
105
0
Order By: Relevance
“…Another approach is domain adaptation from the simulator [8,12,18,39,7,52], since it may be easier to fine tune from a simulator policy than training in the real world. However if the simulator differs from the real world, the policy trained in simulation can perform very poorly in the real world and fine tuning may not be any easier than training from scratch.…”
Section: Transfer From Simulation To the Real Worldmentioning
confidence: 99%
See 1 more Smart Citation
“…Another approach is domain adaptation from the simulator [8,12,18,39,7,52], since it may be easier to fine tune from a simulator policy than training in the real world. However if the simulator differs from the real world, the policy trained in simulation can perform very poorly in the real world and fine tuning may not be any easier than training from scratch.…”
Section: Transfer From Simulation To the Real Worldmentioning
confidence: 99%
“…These physical robots could also damage themselves and their environment while exploring these behaviours. A recent approach to circumvent these challenges is to train on a simulated version of the robot and then transfer the learned policy to the real robot [8,12,18,39,7,52,40,13].…”
Section: Introductionmentioning
confidence: 99%
“…2) Ground truth 2D image coordinates (GT-2D): z is similar to the previous baseline, but the points are projected into the camera using the ground truth camera parameters. 3) Autoencoder (AE): z is the encoding of a pre-trained autoencoder, similar to the visual training in [2], [11]. 4) End-to-End (E2E): z is the intermediate representation from end-to-end training.…”
Section: A Simulation Experimental Setupmentioning
confidence: 99%
“…Note that the two vision-based baselines AE and E2E share an identical model architecture for producing z, and differ only in the method used to train the parameters. The model is close to [1]- [3] with the key architectural traits of having a few convolutional layers followed by a channel-wise spatial expectation operation, which has been widely used [10], [11], [25], [29], [36]- [38]. Most methods we compare (AE, E2E, DD-2D) use only one RGB camera stream as input to learned policies; DD-3D additionally uses the depth image.…”
Section: A Simulation Experimental Setupmentioning
confidence: 99%
“…Each PC/BC-DIM stage functions like a spatial auto-encoder which acted as an encoder for the transformation in one direction whereas the same PC/BC-DIM stage acted as a decoder for the transformation in the opposite direction. The PC/BC-DIM auto-encoder thus has a profound difference compared to all previously reported work for similar eyes-hand coordination tasks where the auto-encoder employed separate encoder and decoder neural circuitry [36][37][38][39].…”
Section: Pc/bc-dim Spatial Auto-encodermentioning
confidence: 99%