In the last decade, reinforcement learning (RL) has been used to solve several tasks with human-level performance. However, there is a growing demand for interpretable RL, i.e., there is the need to understand how a RL agent works and the rationale of its decisions. Not only do we need interpretability to assess the safety of such agents, but also we may need it to gain insights into unknown problems. In this work, we propose a novel optimization approach to interpretable RL that builds decision trees. While techniques that optimize decision trees for RL do exist, they usually employ greedy algorithms or do not exploit the rewards given by the environment. For these reasons, these techniques may either get stuck in local optima or be inefficient. On the contrary, our approach is based on a two-level optimization scheme that combines the advantages of evolutionary algorithms with the benefits of Q-learning. This method allows decomposing the problem into two sub-problems: the problem of finding a meaningful decomposition of the state space, and the problem of associating an action to each subspace. We test the proposed method on three well-known RL benchmarks, as well as on a pandemic control task, on which it results competitive with the state-of-the-art in both performance and interpretability.