Treatment of wastewater using activated sludge relies on several complex, nonlinear processes. While activated sludge systems can provide high levels of treatment, including nutrient removal, operating these systems is often challenging and energy intensive. Significant research investment has been made in recent years into improving control optimization of such systems, through both domain knowledge and, more recently, machine learning. This study leverages a novel interface between a common process modeling software and a Python reinforcement learning environment to evaluate four common reinforcement learning algorithms for their ability to minimize treatment energy use while maintaining effluent compliance within the Benchmark Simulation Model No. 1 (BSM1) simulation. Three of the algorithms tested, deep Q-learning, proximal policy optimization, and synchronous advantage actor critic, generally performed poorly over the scenarios tested in this study. In contrast, the twin delayed deep deterministic policy gradient (TD3) algorithm consistently produced a high level of control optimization while maintaining the treatment requirements. Under the best selection of state observation features, TD3 control optimization reduced aeration and pumping energy requirements by 14.3% compared to the BSM1 benchmark control, outperforming the advanced domain-based control strategy of ammonia-based aeration control, although future work is necessary to improve robustness of RL implementation.