Modern smart grids already consist of various components that interleave classical Operational Technology (OT) with Information and Communication Technology (ICT), which, in turn, have opened the power grid to advanced approaches using distributed software systems and even Artificial Intelligence (AI) applications. This IT/OT integration increases complexity, however, without argument, this advance is necessary to accommodate the rising numbers of prosumers, Distributed Energy Resources (DERs), to enable new market concepts, and to tackle world-wide CO2 emission goals. But the increasing complexity of the Critical National Infrastructure (CNI) power grid gives way to numerous new attack vectors such that a priori robustness cannot be guaranteed anymore and run-time resilience, especially against the “unknown unknowns”, is the focus of current research. In this article, we present a novel combination of so called misuse-case modelling and an approach based on Deep Reinforcement Learning (DRL) to analyze a power grid for new attack vectors. Our approach enables learning from domain knowledge (offline learning), while expanding on that knowledge through learning agents that eventually uncover new attack vectors.