2019
DOI: 10.48550/arxiv.1912.12970
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

Abstract: This paper develops a Pontryagin differentiable programming (PDP) methodology to establish a unified end-to-end learning framework, which solves a large class of learning and control tasks. The proposed PDP framework distinguishes itself from existing ones by two key techniques: first, by differentiating the Pontryagin's Maximum Principle, the PDP framework allows end-to-end learning of a large class of parameterized systems, even when differentiation with respect to an unknown objective function is not readil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Analogously to the parameter fitting process for the 2OL-Eq and MinJerk models, we identify optimal values for the cost weights πœ” 𝑣 , πœ” 𝑓 , and πœ” π‘Ÿ collected in the paramter vector Ξ› for each mean trajectory. Since these parameters only define the objective function 𝐽 (LQR) 𝑁 of the OCP, and the system matrices 𝐴 and 𝐡 are uniquely determined given the above fixed values of π‘š, 𝜏 1 , and 𝜏 2 , the parameter fitting for the LQR can be regarded as an inverse optimal control problem [62]. 8 As usual, we use the bi-level approach described in Section 4.2, i.e., at each iteration of the parameter fitting method, the OCP subject to the parameter vector Ξ› is solved as described above.…”
Section: Results Of Parameter Fittingmentioning
confidence: 99%
“…Analogously to the parameter fitting process for the 2OL-Eq and MinJerk models, we identify optimal values for the cost weights πœ” 𝑣 , πœ” 𝑓 , and πœ” π‘Ÿ collected in the paramter vector Ξ› for each mean trajectory. Since these parameters only define the objective function 𝐽 (LQR) 𝑁 of the OCP, and the system matrices 𝐴 and 𝐡 are uniquely determined given the above fixed values of π‘š, 𝜏 1 , and 𝜏 2 , the parameter fitting for the LQR can be regarded as an inverse optimal control problem [62]. 8 As usual, we use the bi-level approach described in Section 4.2, i.e., at each iteration of the parameter fitting method, the OCP subject to the parameter vector Ξ› is solved as described above.…”
Section: Results Of Parameter Fittingmentioning
confidence: 99%
“…To inject an appropriate inductive bias into the modeling procedure, recent work shows the possibility of embedding differentiable optimization problems as layers in an end-to-end framework [1]. To differentiate through an optimization problem, one could either unroll the numerical computation [17,4] or implicitly differentiate the optimality conditions, such as the KKT conditions in quadratic programs (QP) [3], the Pontryagin minimum principle in optimal control problems [25], and the Euler-Lagrange equations in least-action problems [36]. As the solution to a VI problem can usually be characterized as a fixed-point equation via a projection operator, which is equivalent to a QP problem, our work is built on some results in [3,1].…”
Section: Contribution This Paper Provides a Unified Framework For Lea...mentioning
confidence: 99%
“…Unlike the majority of work in trajectory optimization and control, we use Pontryagin's maximum principle to learn global feedback policies instead of fitting single trajectories. Concurrently Jin et al (2019) explored differentiating through Pontryagin's maximum principle. However, they still considered the standard time discretizations, eg.…”
Section: Related Workmentioning
confidence: 99%