Towards smart batteries for electric vehicles (EVs) smart algorithms to control battery packs, mainly reconfigurable batteries, have to be developed. This work proposes a reinforcement learning (RL) algorithm to balance the State of Charge (SoC) of reconfigurable batteries based on the topologies half-bridge and battery modular multilevel management (BM3). As RL algorithm, Amortized Q-learning (AQL) is implemented, which allows enourmous numbers of possible configurations of the reconfigurable battery to be controlled, as well as the combination of classical controlling approach and machine learning methods. This enables safety mechanisms in control. As a neural network of the AQL a Feedforward Neuronal Network (FNN) is implemented consisting of three hidden layers. The experimental evaluation using a 12-cell hybrid cascaded multilevel converter illustrates the applicability of the method to balance the SoC and maintain the balanced state during discharge. The evaluation shows a 20.3% slower balancing compared to the classical. Nevertheless, AQL shows great potential to be applied for multiobjective optimizations as an applicable RL algorithm for control in power electronics.