Power battery scheduling optimization can improve the service life of the battery, but the existing heuristic algorithm has poor adaptability, and the capacity fluctuates significantly in the cycle aging process, which makes it easy to fall into the local optimal. To overcome these problems, we take the battery cycle life maximization as the goal, propose a reinforcement learning scheduling optimization model with temperature and internal resistance difference constraints, so as to determine whether to charge or discharge during battery cycle aging. We do this using the deep−learning−based battery capacity estimation model as the learning environment for the agent, using the Double DQN algorithm to train the agent, and proposing the principal component analysis method to reduce the dimension of the state space. These experiments, using multiple publicly available battery aging data sets, show that the principal component analysis method and the constraint functions reduce the computational time to find the optimal solution, providing the possibility of obtaining larger reward values. Meanwhile, the trained model effectively extends the cycle life of the battery, and has good adaptivity. It can automatically adjust parameters with the battery aging process to develop optimal charging and discharging protocols for power batteries with different chemical compositions.