This paper presents a control strategy for Cyber-Physical System defense developed in the framework of the European Project ATENA, that concerns Critical Infrastructure (CI) protection. The aim of the controller is to find the optimal security configuration, in terms of countermeasures to implement, in order to address the system vulnerabilities. The attack/defense problem is modeled as a multi-agent general sum game, where the aim of the defender is to prevent the most damage possible by finding an optimal trade-off between prevention actions and their costs. The problem is solved utilizing Reinforcement Learning and simulation results provide a proof of the proposed concept, showing how the defender of the protected CI is able to minimize the damage caused by his/her opponents by finding the Nash equilibrium of the game in the zero-sum variant, and, in a more general scenario, by driving the attacker in the position where the damage she/he can cause to the infrastructure is lower than the cost it has to sustain to enforce her/his attack strategy.