Missile aerodynamic design using reinforcement learning and transfer learning

Yan, Xinghui; Zhu, Jihong; Kuang, Minchi; Wang, Xiangyang

doi:10.1007/s11432-018-9463-x

Cited by 6 publications

(5 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different from directly approximating functions in supervised learning, reinforcement learning does not directly theorize or approximate how people make decisions. There are a limited number of studies of reinforcement learning in the field of fluid dynamics, most of which utilized reinforcement learning for active control problems [20,21], and very few of them attempted shape optimizations [22,23]. The present paper utilizes reinforcement learning for airfoil drag reduction and formulates its policy by interacting with the environment.…”

Section: Iireinforcement Learning For Airfoil Aerodynamic Designmentioning

confidence: 99%

Learning the Aerodynamic Design of Supercritical Airfoils Through Deep Reinforcement Learning

Zhang

Chen

2021

AIAA Journal

View full text Add to dashboard Cite

The aerodynamic design of modern civil aircraft requires a true sense of intelligence since it requires a good understanding of transonic aerodynamics and sufficient experience.Reinforcement learning is an artificial general intelligence that can learn sophisticated skills by trial-and-error, rather than simply extracting features or making predictions from data.The present paper utilizes a deep reinforcement learning algorithm to learn the policy for reducing the aerodynamic drag of supercritical airfoils. The policy is designed to take actions based on features of the wall Mach number distribution so that the learned policy can be more general. The initial policy for reinforcement learning is pretrained through imitation learning, and the result is compared with randomly generated initial policies. The policy is then trained in environments based on surrogate models, of which the mean drag reduction of 200 airfoils can be effectively improved by reinforcement learning. The policy is also tested by multiple airfoils in different flow conditions using computational fluid dynamics calculations. The results show that the policy is effective in both the training condition and other similar conditions, and the policy can be applied repeatedly to achieve greater drag reduction.

show abstract

Section: Iireinforcement Learning For Airfoil Aerodynamic Designmentioning

confidence: 99%

Learning the Aerodynamic Design of Supercritical Airfoils Through Deep Reinforcement Learning

Zhang

Chen

2021

AIAA Journal

View full text Add to dashboard Cite

show abstract

“…Finally, substituting (8) into 7, we now obtain the following game algebraic Riccati equation (GARE):…”

Section: Problem Statementmentioning

confidence: 99%

“…The difficulty of obtaining the feedback Nash equilibrium in (8) lies in the solution to the nonlinear GARE in (9). Moreover, both (8) and (9) are dependent on the knowledge of system dynamics, i.e., A, B 1 , . .…”

Section: Problem Statementmentioning

confidence: 99%

“…Compared with dynamic programming (DP), RL runs forward in time (online) and overcomes the curse of dimensionality. In control theory, RL is referred to as adaptive dynamic programming (ADP), which has been widely used to solve optimal control problems [6][7][8][9][10][11][12][13][14][15] and dynamic games [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Peng

Jiao

et al. 2019

Sci. China Inf. Sci.

View full text Add to dashboard Cite

A model-based offline policy iteration (PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can learn the feedback Nash equilibrium online using the state samples generated by behavior policies, without sending inquiries to the system model. Unlike the existing Q-learning methods, this novel Q-learning algorithm executes both policy evaluation and policy improvement in an adaptive manner. We prove the convergence of the offline PI algorithm by proving its equivalence to Newton's method while solving the game algebraic Riccati equation (GARE). Furthermore, we prove that the proposed Q-learning method will converge to the Nash equilibrium under a small learning rate if the method satisfies certain persistence of excitation conditions, which can be easily met by suitable behavior policies. Our simulation results demonstrate the good performance of the proposed online adaptive Q-learning algorithm.

show abstract

“…To handle complex state spaces and achieve better generalization performance, researchers have proposed the concept of function approximators [1,6,7]. Inspired by the success of deep learning, researchers have applied deep neural networks to the reinforcement learning algorithms [8][9][10][11][12] and achieved impressive results in a wide range of fields such as Atari 2600 [12], non-zero-sum games [13], missile aerodynamic design [14], and music generation [15].…”

Section: Introductionmentioning

confidence: 99%

Accelerated value iteration via Anderson mixing

Li¹,

Ni²,

Xie³

et al. 2021

Sci. China Inf. Sci.

View full text Add to dashboard Cite

In this paper, we introduce the Anderson acceleration technique developed to be applied to reinforcement learning tasks. We develop an accelerated value iteration algorithm referred as Anderson accelerated value iteration (A2VI) and an accelerated deep Q-learning algorithm denoted as deep Anderson accelerated Q-learning (DA2Q) algorithm. The proposed approach allows improving the performance of value iteration by interpolating historical data. We perform a theoretical analysis on linear convergence and conduct performance evaluation of the proposed algorithms, including synthetic experiments and classical control tasks. We conclude that both theoretical and empirical results confirm the effectiveness of the proposed algorithm.

show abstract

Missile aerodynamic design using reinforcement learning and transfer learning

Cited by 6 publications

References 5 publications

Learning the Aerodynamic Design of Supercritical Airfoils Through Deep Reinforcement Learning

Learning the Aerodynamic Design of Supercritical Airfoils Through Deep Reinforcement Learning

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Accelerated value iteration via Anderson mixing

Contact Info

Product

Resources

About