Chemical reaction are dynamical processes involving the correlated reorganization of atomic configurations, driving the conversion of an initial reactant into a result product. By virtue of the metastability of both the reactants and products, chemical reactions are rare events, proceeding fleetingly. Reaction pathways can be modelled probabilistically by using the notion of reactive density in the phase space of the molecular system. Such density is related to a function known as the committor function, which describes the likelihood of a configuration evolving to one of the nearby metastable regions. In theory, the committor function can be obtained by solving the backward Kolmogorov equation, which is a partial differential equation defined in the full dimensional phase space. However, using traditional methods to solve this problem is not practical for high dimensional systems. In this work, we propose a reinforcement learning based method to identify important configurations that connect reactant and product states along chemical reaction paths. By shooting multiple trajectories from these configurations, we can generate an ensemble of states that concentrate on the transition path ensemble. This configuration ensemble can be effectively employed in a neural network-based partial differential equation solver to obtain an approximation solution of a restricted Backward Kolmogorov equation, even when the dimension of the problem is very high. The resulting solution provides an approximation for the committor function that encodes mechanistic information for the reaction, paving a new way for understanding of complex chemical reactions and evaluation of reaction rates.