In recent years, penetration testing (pen-testing) has emerged as a
crucial process for evaluating the security level of network
infrastructures by simulating real-world cyber-attacks. Automating
pen-testing through reinforcement learning (RL) facilitates more
frequent assessments, minimizes human effort, and enhances scalability.
However, real-world pen-testing tasks often involve incomplete knowledge
of the target network system. Effectively managing the intrinsic
uncertainties via partially observable Markov decision processes
(POMDPs) constitutes a persistent challenge within the realm of
pen-testing. Furthermore, RL agents are compelled to formulate intricate
strategies to contend with the challenges posed by partially observable
environments, thereby engendering augmented computational and temporal
expenditures. To address these issues, this study introduces EPPTA
(Efficient POMDP-Driven Penetration Testing Agent), an agent built on an
asynchronous RL framework, designed for conducting pen-testing tasks
within partially observable environments. We incorporate an implicit
belief module in EPPTA, grounded on the belief update formula of the
traditional POMDP model, which represents the agent’s probabilistic
estimation of the current environment state. Furthermore, by integrating
the algorithm with the high-performance RL framework, Sample Factory,
EPPTA significantly reduces convergence time compared to existing
pen-testing methods, resulting in an approximately 20-fold acceleration.
Empirical results across various pen-testing scenarios validate EPPTA’s
superior task reward performance and enhanced scalability, providing
substantial support for efficient and advanced evaluation of network
infrastructure security.