Selected configuration interaction (sCI) methods exploit the sparsity of the full configuration interaction (FCI) wave function, yielding significant computational savings and wave function compression without sacrificing the accuracy. Despite recent advances in sCI methods, the selection of important determinants remains an open problem. We explore the possibility of utilizing reinforcement learning approaches to solve the sCI problem. By mapping the configuration interaction problem onto a sequential decision-making process, the agent learns on-the-fly which determinants to include and which to ignore, yielding a compressed wave function at near-FCI accuracy. This method, which we call reinforcement learned configuration interaction (RLCI), adds another weapon to the sCI arsenal and highlights how reinforcement learning approaches can potentially help solve challenging problems in electronic structure theory.