Actor-Critic Reinforcement Learning Control of Non-Strict Feedback Nonaffine Dynamic Systems

Bu, Xiangwei

doi:10.1109/access.2019.2917141

Cited by 16 publications

(8 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The code refers (https://github.com/openai/maddpg). MADDPG is an extension of the actor-critic [29], [32] model. However, MADDPG has to train an independent policy network for each agent, where each agent would learn a policy specializing specific tasks [33] based on its own observation, and the policy network easily overfits to the number of agents.…”

Section: ) Multi-agent Deep Deterministic Policy Gradient Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A New Smart Router-Throttling Method to Mitigate DDoS Attacks

et al. 2019

View full text Add to dashboard Cite

The distributed denial of service (DDoS) attack is one of the most server threats to the current Internet and brings huge losses to society. Furthermore, it is challenging to defend DDoS due to the case that the DDoS traffic can appear similar to the legitimate ones. Router throttling is an accessible approach to defend DDoS attacks. Some existing router throttling methods dynamically adjust a given threshold value to keep the server load safe. However, these methods are not ideal as they exploit the information of the current time, so the perception of time series variations is poor. The DDoS problem can be seen as a Markov decision process (MDP). Multi-agent router throttling (MART) method based on hierarchical communication mechanism has been proposed to address this problem. However, each agent is independent with each other and has no communication among them, therefore, it is hard for them to collaborate to learn an ideal policy to defend DDoS. To solve this multi-agent partially observable MDP problem, we propose a centralized reinforcement learning router throttling method based on a centralized communication mechanism. Each router sends its own traffic reading to a central router, the central router then makes a decision for each router to choose the throttling rate. We also simulate the environment of the DDoS problem more realistic while modify the reward function of the MART to make the reward function of more coherent. To decrease the communication costs, we add a deep deterministic policy gradient network for each router to decide whether or not to send information to the central agent. The experiments validate that our proposed new smart router throttling method outperforms existing methods to the DDoS instruction response.INDEX TERMS Distributed denial of service, router throttling, Markov decision process, multi-agent router throttling, hierarchical communication, centralized communication, communication costs.

show abstract

Section: ) Multi-agent Deep Deterministic Policy Gradient Methodsmentioning

confidence: 99%

“…(iii) Actor-critic methods aim to combine the advantages of actor-only and critic-only methods. Actor-critic [28], [29] algorithm was generally believed that learning value functions using large, non-linear function approximators was difficult and unstable.…”

Section: ) Hierarchical Communication Mechanismmentioning

confidence: 99%

A New Smart Router-Throttling Method to Mitigate DDoS Attacks

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Clearly, there exists a compromise between multiple indices including prescribed performance and other control qualities. In practical applications, we should make a reasonable tradeoff between these indices [117,[157][158][159][160][161][162][163][164][165][166][167][168][169].…”

Section: The Approach For Handing the Perturbations Of Tracking Errorsmentioning

confidence: 99%

Prescribed performance control approaches, applications and challenges: A comprehensive survey

2022

Asian Journal of Control

Self Cite

View full text Add to dashboard Cite

As to control systems, transient performance is as important as steady‐state performance. For some special dynamic systems, transient performance is a more prior index in comparison with the steady‐state one. Prescribed performance control (PPC) has been proved to be a powerful tool that guarantees control system outputs/errors with desired transient performance as well as steady‐state performance. The purpose of this paper is to give a comprehensive review on the latest developments of PPC theories and applications. The existing performance functions are classified into five different categories, and their features are comprehensively compared, providing a useful guidance for further applications. Then, the latest developments of PPC's applications in all kinds of control systems are recalled. Specially, the faced challenges and theoretical defects of PPC are discussed, which is expected to point out the further research direction for PPC.

show abstract

“…Shear Impedance Mode Control with Adaptive Fuzzy Compensation for Robot Environment Interaction was investigated by Hu [14]. Actor-Critic Reinforcement Learning Control from a Dynamic Non-Strict Feedback System Nonaffine was investigated by Bu [15]. Sadeq used the optimal control strategy to maximize electric vehicle hybrid energy storage system performance considering topographical information [16].…”

Section: Extended Fuzzy Adaptive Event Trigger Compensationmentioning

confidence: 99%

Second Order Integral Fuzzy Logic Control Based Rocket Tracking Control

Iswanto

Ahmad

2021

jrc

View full text Add to dashboard Cite

Fuzzy logic is a logic with a degree of vulnerability ranging from 0 to 1. Fuzzy logic is used to convert a quantity into language. It is used as a control system because it is a versatile and simple control process that does not require complex mathematical models. The paper aimed to present a fuzzy control system implemented in a rocket tracking control system as the controller because it could work well on non-linear systems and offered convenience in program design. The fuzzy control system worked to keep the rocket on track and travel at a fixed speed. The signal from the fuzzy logic control system served to control the rocket thrust. However, the process of the fuzzy logic control system is slow and time-consuming, not proper for the one that required rapid control, such as rockets, and is not applicable for tracking ramp and parabolic signals. The fuzzy logic, therefore, was modified by adding second-order integral control. The proposed algorithm showed that, by adding second-order integral control, the rocket glided 12.78m at 12 seconds with a steady-state error of 0.78 according to the setpoint of ramp path 12 m; 10.68m at 10 seconds with a steady-state error of 0.68 according to the setpoint of ramp path 10m; and 4.689m at 4 seconds with a steady-state error of 0.689 according to the setpoint of ramp path 4m. In accordance with the parabolic path, the rocket glided 15.47m at the 4th minute with 0 steady-state error.

show abstract

Actor-Critic Reinforcement Learning Control of Non-Strict Feedback Nonaffine Dynamic Systems

Cited by 16 publications

References 34 publications

A New Smart Router-Throttling Method to Mitigate DDoS Attacks

A New Smart Router-Throttling Method to Mitigate DDoS Attacks

Prescribed performance control approaches, applications and challenges: A comprehensive survey

Second Order Integral Fuzzy Logic Control Based Rocket Tracking Control

Contact Info

Product

Resources

About