The driving fatigue state of shield machine drivers directly affects the safe operation and tunneling efficiency of shield machines during metro construction. To cope with the problem that it is challenging to simulate the working conditions and operation process of shield machine drivers using driving simulation platforms and that the existing fatigue feature fusion methods usually show low recognition accuracy, shield machine drivers at Shenyang metro line 4 in China were taken as the research subjects, and a multi-modal physiological feature fusion method based on an L2-regularized stacked auto-encoder was designed. First, the ErgoLAB cloud platform was used to extract the combined energy feature (E), the reaction time, the HRV (heart rate variability) time-domain SDNN (standard deviation of normal-to-normal intervals) index, the HRV frequency-domain LF/HF (energy ratio of low frequency to high frequency) index and the pupil diameter index from EEG (electroencephalogram) signals, skin signals, pulse signals and eye movement data, respectively. Second, the physiological signal characteristics were extracted based on the WPT (wavelet packet transform) method and time–frequency analysis. Then, a method for driving fatigue feature fusion based on an auto-encoder was designed aiming at the characteristics of the L2-regularization method to solve the over-fitting problem of small sample data sets in the process of model training. The optimal hyper-parameters of the model were verified with the experimental method of the control variable, which reduces the loss of multi-modal feature data in compression fusion and the information loss rate of the fused index. The results show that the method proposed outperforms its competitors in recognition accuracy and can effectively reduce the loss rate of deep features in existing decision-making-level fusion.