Aiming to enhance the efficiency and precision of multi-objective optimization in southwestern secondary growth of Pinus yunnanensis forests, this study integrated spatial and non-spatial structural indicators to establish objective functions and constraints for assessing forest structure. Felling decisions were made using the random selection method (RSM), Q-value method (QVM), and V-map method (VMM). Actions taken to optimize the forest stand structure (FSS) through tree selection were approached as decisions by a reinforcement learning (RL) agent. Leveraging RL’s trial-and-error strategy, we continually refined the agent’s decision-making process, applying it to multi-objective optimization. Simulated felling experiments conducted across circular sample plots (P1–P4) compared RL, Monte Carlo (MC), and particle swarm optimization (PSO) in FSS optimization. Notable enhancements in the values of the objective function (VOFs) were observed across all plots. RL-based strategies exhibited improvements, achieving VOF increases of 17.24%, 44.92%, 34.66%, and 17.10% for P1–P4, respectively, outperforming MC-based (10.73%, 41.54%, 30.39%, and 15.07%, respectively) and PSO-based (14.08%, 37.78%, 26.17%, and 16.23%, respectively) approaches. The hybrid M7 scheme, integrating RL with the RSM, consistently outperformed other schemes across all plots, yielding an average 26.81% increase in VOF compared to the average enhancement of all schemes (17.42%). This study significantly advances the efficacy and precision of multi-objective optimization strategies for Pinus yunnanensis secondary forests, emphasizing RL’s superior optimization performance, particularly when combined with the RSM, highlighting its potential for optimizing sustainable forest management strategies.