This research proposes a novel approach to global path and resource planning for lunar rovers. The proposed method incorporates a range of constraints, including static, time-variant, and path-dependent factors related to environmental conditions and the rover’s internal resource status. These constraints are integrated into a grid map as a penalty function, and a reinforcement learning-based framework is employed to address the resource constrained shortest path problem (RCSP). Compared to existing approaches referenced in the literature, our proposed method enables the simultaneous consideration of a broader spectrum of constraints. This enhanced flexibility leads to improved path search optimality. To evaluate the performance of our approach, this research applied the proposed learning architecture to lunar rover path search problems, generated based on real lunar digital elevation data. The simulation results demonstrate that our architecture successfully identifies a rover path while consistently adhering to user-defined environmental and rover resource safety criteria across all positions and time epochs. Furthermore, the simulation results indicate that our approach surpasses conventional methods that solely rely on environmental constraints.