Application considerations for the DHP methodology

Lendaris, G.; Shannon, T.T.

doi:10.1109/ijcnn.1998.685910

Cited by 19 publications

(24 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…10-12, dual heuristic programming (DHP) has been shown to learn quickly and to alleviate persistence of excitation problems by computing the correlation between the cost and the individual state elements. 4,10,13 In this paper, a DHP architecture is trained online to control the nonlinear simulation of a business jet aircraft over its full operating envelope, improving performance during unexpected conditions such as unmodeled dynamics, parameter variations, and control failures.…”

Section: Introductionmentioning

confidence: 99%

Online Adaptive Critic Flight Control

Ferrari

Stengel

2004

Journal of Guidance, Control, and Dynamics

130

View full text Add to dashboard Cite

A nonlinear control system comprising a network of networks is taught by the use of a two-phase learning procedure realized through novel training techniques and an adaptive critic design. The neural network controller is trained algebraically, offline, by the observation that its gradients must equal corresponding linear gain matrices at chosen operating points. Online learning by a dual heuristic adaptive critic architecture optimizes performance incrementally over time by accounting for plant dynamics and nonlinear effects that are revealed during large, coupled motions. The method is implemented to control the six-degree-of-freedom simulation of a business jet aircraft over its full operating envelope. The result is a controller that improves its performance while unexpected conditions, such as unmodeled dynamics, parameter variations, and control failures, are experienced for the first time.

show abstract

Section: Introductionmentioning

confidence: 99%

Online Adaptive Critic Flight Control

Ferrari

Stengel

2004

Journal of Guidance, Control, and Dynamics

130

View full text Add to dashboard Cite

show abstract

“…An algorithm for updating the DHP functionals can be obtained from the policy-improvement routine and the value-determination operation, as explained in Section 3.3.3. In applications [23,24], the DHP algorithm has been shown to find the optimal solution more rapidly (with less iteration cycles) than HDP. However, due to the use of derivative information, the relationships for updating the control and value-derivative functionals are more involved.…”

Section: Action-dependent (Ad) Designsmentioning

confidence: 99%

ModelBased Adaptive Critic Designs

Barto

Powell

et al. 2009

Handbook of Learning and Approximate Dynamic Programming

View full text Add to dashboard Cite

“…Under the assumption that the critic is accurately estimating the gradient of the cost to go of the policy specified by the controller's parameter values, the critic function can be used to adjust these parameters so as to arrive at a local optimum in the parameterized policy space. This process has been successfully operationalized using artificial neural networks for both the control and critic functions [2], [12], [19], [20], [23], [24].…”

Section: A Approximate Dynamic Programmingmentioning

confidence: 99%

“…For the DHP method, where the critic estimates the derivatives of J(t) with respect to the system states (stock levels, transport available, etc.) R, the critic's output is defined as (10) We differentiate both sides of Bellman's Recursion (11) to get the identity used for critic training (12) To evaluate the right hand side of this equation we need a model of the system dynamics that includes all the terms from the Jacobian matrix of the coupled plant-controller system, e.g., all the and . In terms of our specific problem, we would be estimating such things as , and .…”

Section: A Approximate Dynamic Programmingmentioning

confidence: 99%

See 1 more Smart Citation

Intelligent supply chain management using adaptive critic learning

Shervais

Shannon

Lendaris

2003

IEEE Trans. Syst., Man, Cybern. A

View full text Add to dashboard Cite

Abstract-A set of neural networks is employed to develop control policies that are better than fixed, theoretically optimal policies, when applied to a combined physical inventory and distribution system in a nonstationary demand environment. Specifically, we show that model-based adaptive critic approximate dynamic programming techniques can be used with systems characterized by discrete valued states and controls. The control policies embodied by the trained neural networks outperformed the best, fixed policies (found by either linear programming or genetic algorithms) in a high-penalty cost environment with time-varying demand.Index Terms-Adaptive critics, approximate dynamic programming, artificial neural networks, dual heuristic programming, genetic algorithms, supply chain management.

show abstract

Application considerations for the DHP methodology

Cited by 19 publications

References 4 publications

Online Adaptive Critic Flight Control

Online Adaptive Critic Flight Control

ModelBased Adaptive Critic Designs

Intelligent supply chain management using adaptive critic learning

Contact Info

Product

Resources

About