2022
DOI: 10.1155/2022/1488344
|View full text |Cite
|
Sign up to set email alerts
|

A Hidden Attack Sequences Detection Method Based on Dynamic Reward Deep Deterministic Policy Gradient

Abstract: Attacker identification from network traffic is a common practice of cyberspace security management. However, network administrators cannot cover all security equipment due to the cyberspace management cost constraints, giving attackers the chance to escape from the surveillance of network security administrators by legitimate actions and to perform the attack in both physical domain and digital domain. Therefore, we proposed a hidden attack sequence detection method based on reinforcement learning to deal wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…An upgrade of DDPG is accomplished as dynamic reward DDPG in [55] which shows 97.46% accuracy in detecting attackers. The authors of [56] propose a DDPG IDS approach to achieve a detection accuracy of 97.28% in the WUSTIL-IIOT-2021 test set.…”
Section: Rl Policy Gradient Algorithms and Its Variantsmentioning
confidence: 99%
“…An upgrade of DDPG is accomplished as dynamic reward DDPG in [55] which shows 97.46% accuracy in detecting attackers. The authors of [56] propose a DDPG IDS approach to achieve a detection accuracy of 97.28% in the WUSTIL-IIOT-2021 test set.…”
Section: Rl Policy Gradient Algorithms and Its Variantsmentioning
confidence: 99%
“…Attacks from outside the company (external intrusions) and internal intrusions are among the security lapses. In recent few research, DRL has been used to defend systems against network intrusion attacks and solve the problem [51,58].…”
Section: ) Cyber Attack Intrusion Detectionmentioning
confidence: 99%
“…In the structure of actor-critic, the update of the actor policy depends on the critic value function [31][32][33]. Given the online network parameter φ, ϕ approx denotes the updated parameter of the actor network calculated by the estimated maximal value function max (Q θ (s, a)), ϕ true denotes the parameter obtained by using the actual value function Q π (s, a), where Q π (s, a) is unknown in the training process which represents the value function in an ideal state, then ϕ approx and ϕ true can be expressed in the following equation:…”
Section: Error Analysis It Is An Inevitable Problem For Q-mentioning
confidence: 99%