Tracking control of the output probability density function presents significant challenges, particularly when dealing with unknown system models and multiplicative noise disturbances. To address these challenges, this paper introduces a novel tracking control algorithm based on reinforce-ment Q-learning. Initially, a B-spline model is employed to represent the original system, thereby transforming the control problem into a state weight tracking issue within the B-spline stochastic system model. Moreover, to tackle the challenge of unknown stochastic system dynamics and the presence of multiplicative noise, a model-free reinforcement Q-learning algorithm is employed to solve the control problem. Finally, the proposed algorithm’s effectiveness is validated through comprehensive simulation examples.