This paper proposes a novel Monte Carlo tree search (MCTS) algorithm to solve the protein folding problem in HP model. There are two main challenges. First, the problem is proved to be NP‐complete. The solution space is large and it is hard to find a good solution via a search algorithm without prior knowledge of the HP model. We tackle this challenge by formulating the problem as a deterministic Markov decision process (MDP) and solve it in an AlphaZero's manner. The difference is that we design a MCTS algorithm with two stages: neural exploitation stage and random exploration stage. In the first stage, the search algorithm utilizes the knowledge from previous experience by evaluating the states with a trained neural network, while in the second stage, the states are evaluated by fast and random rollouts. It effectively reduces the number of neural inferences and computational cost. The second challenge is that the evaluation of typical MCTS cannot preserve the correct preference over the actions in our task. To address this challenge, we propose an over‐sampling mechanism that encourages the agent to search more on those actions with high rollout values. The proposed method is tested and compared in a series of experiments. Experimental results have confirmed the effectiveness of the proposed method empirically. Besides, we also visualize the obtained the best conformations and verify the proposed technical designs through an ablation study. © 2022 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.