The main gearbox is very important for the operation safety of helicopters, and the oil temperature reflects the health degree of the gearbox; therefore establishing an accurate oil temperature forecasting model is an important step for reliable fault detection. Firstly, in order to achieve accurate gearbox oil temperature forecasting, an improved deep deterministic policy gradient algorithm with a CNN–LSTM basic learner is proposed, which can excavate the complex relationship between oil temperature and working condition. Secondly, a reward incentive function is designed to accelerate the training time costs and to stabilize the model. Further, a variable variance exploration strategy is proposed to enable the agents of the model to fully explore the state space in the early training stage and to gradually converge in the training later stage. Thirdly, a multi-critics network structure is adopted to solve the problem of inaccurate Q-value estimation, which is the key to improving the prediction accuracy of the model. Finally, KDE is introduced to determine the fault threshold to judge whether the residual error is abnormal after EWMA processing. The experimental results show that the proposed model achieves higher prediction accuracy and shorter fault detection time costs.