Battery packs in electric vehicles are managed by battery management systems that influence the state of charge among the cells in the pack, where such systems have received much attention in research. More recently, balancing the temperature among the cells has become a research topic. In our work, we consider a dual-balancing problem where we aim to balance both the parameters of the state of charge and temperature. We consider a Smart Battery Pack, where individual cells can be bypassed, meaning that no current is going to or from the cell, which allows the cell to cool off while the cell does not charge or discharge. Moreover, a smart battery pack can estimate each cell's characteristics, which, in turn, can be used to define a model of cell and battery pack behavior. We conduct experiments using the model of a battery pack where each cell differs in its configuration as an effect of aging. For such a pack with heterogeneous cells, we use Q-Learning in Uppaal Stratego to synthesize a controller that maximizes the time spent in a balanced state, meaning that all cells' states are within a specific range of each other. We show significant improvements in two aspects compared with two threshold-based controllers that balance either state of charge or temperature. The synthesized controllers are only unbalanced with the state of charge between 1-4% of the time and for temperature between 15-20% of the time. The threshold-based controllers are either unbalanced for the state of charge for as much as 37% of the time or for temperature for as much as 44% of the time. Finally, the maximum variations of state of charge and temperature among the cells are decreased.