The gas collector pressure control of coke ovens is a complex industrial process, which has nonlinear, time-varying and strong coupling characteristics. Moreover, the gas collector pressure is sensitive to environment, and the pressure values vary irregularly and dramatically. Keeping the gas collector pressure values in a given range has important effects on the gas utilization efficiency and safety production. Recently years, many intelligent control technologies have been applied to the actual industrial production, and achieve good control effect. A hybrid multilevel intelligent control strategy was proposed in Ref [1]: a fuzzy control algorithm with variable and adjustable factors was adopted in the loop control level, and a manifold suction supervisory control was adopted in the main control level, as well as in-group and between-group decoupling rules in the decoupling level. According to the characteristics of suction control and gas pressure of a coke oven blast cooling system, a new fuzzy adaptive PID control algorithm for gas collector pressure control of coke ovens was presented in Ref [2], in which a fuzzy decoupling method was adopted for decoupling control. In Ref [3], on the analysis of complex pressure coupling relationship between gas collectors, a compensation decoupling algorithm based on rules for parallel production of coke ovens was proposed. The reinforcement learning is widely used in the field of artificial intelligent and machine learning [4][5][6][7]. The reinforcement learning algorithm provides reinforcement signals for the current action through environment and evaluates its effect on the future without mathematical models and prior knowledge of dynamic models. Due to rare external environment information, the reinforcement learning must rely on its