“…Other improvements of the RLCI method include modifying the action space to allow more than one determinant to be added or removed from the state, optimizing the learning rate and the discount factor, and gaining a better understanding of the trade-off between exploration and exploitation. Additional investigations of perturbative corrections on top of the RLCI-learned wave function may also yield robust convergence to the FCI limit with compact wave function references, as has been observed in other sCI methods 3,16,[46][47][48]…”