Identifying the critical paths is crucial to reducing the complexity of performance analysis and reliability calculation for logic circuits. In this paper, we propose a method for identifying the critical path in a combination circuit using a reinforcement learning framework to enhance its applicability and compatibility. Initially, we configured the learning environment of the model based on circuit structure information to provide valuable information for decision-making on time. Subsequently, the upper confidence bound applied to trees (UCT) algorithm is employed to construct the behavior decision strategy of the model, which avoids invalid traversal and reduces computing costs. Then, a goal-oriented reward and punishment function is constructed based on the distance from the circuit primary outputs. Finally, based on the parallel computing strategy, we construct an adaptive training method to improve the model’s prediction accuracy by using finite sampling, which speeds up the convergence speed and enhances the quality of the model. Experimental results on benchmark circuits show that, with the functional timing analysis method as the reference, the average accuracy of the proposed method is as high as 99.39% and the single average calculation speed is 18.07 times faster than that of the reference method. Compared with the Monte Carlo model, the proposed method has a higher critical path hit rate, and the average calculation speed is 928.75 times faster.