AIAA Scitech 2020 Forum 2020
DOI: 10.2514/6.2020-1234
|View full text |Cite
|
Sign up to set email alerts
|

Closed-Loop Q-Learning Control of a Small Unmanned Aircraft

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 7 publications
0
8
0
Order By: Relevance
“…Discrete actions spaces have been used previously with both Deep Q-Network [6] and PPO [7]. The actions are encoded as a pair of arrays of angular rates for the wing sweep and elevator, with the action effectively selecting a pair of indices into these arrays.…”
Section: Continuous Actionsmentioning
confidence: 99%
See 2 more Smart Citations
“…Discrete actions spaces have been used previously with both Deep Q-Network [6] and PPO [7]. The actions are encoded as a pair of arrays of angular rates for the wing sweep and elevator, with the action effectively selecting a pair of indices into these arrays.…”
Section: Continuous Actionsmentioning
confidence: 99%
“…Model-based RL tends to be more sample efficient than model-free, however it is more complex and challenging to train [14]. Clarke et al used a Deep Q-Network (DQN), a model-free, value-based algorithm [6]. DQN, developed by Mnih et al, was the first deep-RL algorithm to be successfully demonstrated.…”
Section: Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…In this work, Deep Deterministic Gradient Policy (DDGP), Trust Region Policy Optimisation (TRPO) and Proximal Policy Optimisation (PPO), were trained and compared in simulation and the PPO algorithm showed superior performance overall, including outperforming a conventional controller. Fixed-wing UAVs have also seen RL algorithms used for control, showing high levels of performance for attitude control in simulation [38], for angle of attack control in wind tunnel tests [25], or for perched landing [39][40][41]. A Q-learning RL algorithm was used to attain a policy based agent to find the best flight profile for perching manoeuvre in open loop flight tests [39].…”
Section: Introductionmentioning
confidence: 99%
“…A Q-learning RL algorithm was used to attain a policy based agent to find the best flight profile for perching manoeuvre in open loop flight tests [39]. Flight tests of the same platform in closed-loop configuration [40] followed on from this, where a Deep Q-Network (DQN) algorithm controlled the aircraft during flight but suffered from the reality gap between the simulation environment it had learnt from and flight tests. More recently, flight tests showed a reduced reality gap when an improved version of this controller was developed using PPO [41].…”
Section: Introductionmentioning
confidence: 99%