In this article, a reinforcement learning (RL)-based scalable technique is presented to control the probabilistic Boolean control networks (PBCNs). In particular, a double deep-Q network (DDQN) approach is firstly proposed to address the output tracking problem of PBCNs, and optimal state feedback controllers are obtained such that the output of PBCNs tracks a constant as well as a time-varying reference signal. The presented method is model-free and offers scalability, thereby provides an efficient way to control large-scale PBCNs that are a natural choice to model gene regulatory networks (GRNs). Finally, three PBCN models of GRNs including a 16-gene and 28-gene networks are considered to verify the presented results.INDEX TERMS Double deep-Q learning, model-free technique, output tracking, probabilistic Boolean control networks, scalability.