Implementing control schemes for modular multilevel converters (M2Cs) involves both a cyber and a physical level, leading to a cyber-physical system (CPS). At the cyber level, a communication network enables the data exchange between sensors, control platforms, and monitoring systems. Meanwhile, at the physical level, the semiconductor devices that comprise the M2C are switched ON/OFF by the control system. In this context, almost all published works in this research area assume that the CPS always reports correct information. However, this may not be the case when the M2C is affected by cyber-attacks, such as the one named false data injection attack (FDIA), where the data seen by the control system is corrupted through illegitimate data intrusion into the CPS. To deal with this situation, FDIA detectors for the M2C are recently starting to be studied, where the goal is to detect and mitigate the attacks and the attacked sub-modules. This paper proposes a reinforcement learning (RL)-based method to uncover the deficiencies of existing FDIAs detectors used for M2C applications. The proposed method auto-generates complex attack sequences able to bypass FDIA detectors. Therefore, it points out the weaknesses of current detectors: This valuable information can be used later to improve the performance of the detectors, establishing more reliable cybersecurity solutions for M2Cs. The RL environment is developed in Matlab/Simulink augmented by PLECS/blockset, and it is made available to researchers on a website to motivate future research efforts in this area. Hardware-in-the-loop (HIL) studies verify the proposal's effectiveness.