Match-3 puzzle games have garnered significant popularity across all age groups due to their simplicity, non-violent nature, and concise gameplay. However, the development of captivating and well-balanced stages in match-3 puzzle games remains a challenging task for game developers. This study aims to identify the optimal algorithm for reinforcement learning to streamline the level balancing verification process in match-3 games by comparison with Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms. By training the agent with these two algorithms, the paper investigated which approach yields more efficient and effective difficulty level balancing test results. After the comparative analysis of cumulative rewards and entropy, the findings illustrate that the SAC algorithm is the optimal choice for creating an efficient agent capable of handling difficulty level balancing for stages in a match-3 puzzle game. This is because the superior learning performance and higher stability demonstrated by the SAC algorithm are more important in terms of stage difficulty balancing in match-3 gameplay. This study expects to contribute to the development of improved level balancing techniques in match-3 puzzle games besides enhancing the overall gaming experience for players.