Advances in the field of goal-directed molecular optimization offer the promise to find feasible candidates for even the most challenging molecular design applications. However, several obstacles remain in applying these tools to practical problems, including lengthy computational or experimental evaluation, synthesizability considerations, and a vast potential search space. As an example of a fundamental design challenge with industrial relevance, we search for novel stable radical scaffolds for an aqueous redox flow battery that simultaneously satisfy redox requirements at the anode and cathode. To meet this challenge, we develop a new open-source molecular optimization framework based on AlphaZero coupled with a fast, machine learning-derived surrogate objective trained with nearly 100,000 quantum chemistry simulations. The objective function comprises two graph neural networks: one that predicts adiabatic oxidation and reduction potentials and a second that predicts electron density and local 3D environment, previously shown to be correlated with radical persistence and stability. With no hand-coded knowledge of organic chemistry, the reinforcement learning agent finds molecule candidates that satisfy a precise combination of redox, stability, and synthesizability requirements defined at the quantum chemistry level, many of which have reasonable predicted retrosynthetic pathways. The optimized molecules show that alternative stable radical scaffolds may offer a unique profile of stability and redox potentials to enable low-cost symmetric aqueous redox flow batteries.