This paper considers a multi-robot team tasked with monitoring an environmental field of interest over long time horizons. The approach is based on a control-theoretic measure of the information collected by the robots, namely a norm of the constructability Gramian. This measure is leveraged in order to learn a distributed multi-robot control policy using the reinforcement learning paradigm. The learned policy is then combined with energy constraints using the constraint-driven control framework in order to achieve persistent environmental monitoring. The proposed approach is tested in a simulated multi-robot persistent environmental monitoring scenario where a team of robots with limited availability of energy is to be controlled in a coordinated fashion in order to estimate the concentration of a gas diffusing in the environment.