In order to avoid the safety accidents caused by earth pressure imbalance during shield machine tunneling process, the earth pressure between excavation face and that in chamber must be maintained balance, but it is difficult for practical engineering. Therefore, a data-driven multi-variable optimization method based on dual heuristic programming (DHP) is proposed. First, a cost function with respect to the chamber’s earth pressure is given in light of Bellman’s principle. Then, based on back propagation neural networks (BPNN), the action network, model network and critic network are established that compose the DHP controller. The networks’ weights are updated through the gradient descent algorithm. By minimizing the cost function, the action network utilizes the critic network’s error to optimize the control variables, so that the optimal advance speed, cutter head torque, cutter head speed, total thrust and screw conveyor speed are obtained. Finally, the simulation experiments are carried out, and the results indicate that the method can effectively control the earth pressure balance in chamber and has strong anti-interference ability.