For wavelength division multiplexing (WDM) systems, excessive linear and nonlinear noise will seriously decrease the quality of optical signals, and the effective joint monitoring scheme can prevent the degradation of system performance due to noise accumulation. In this paper, we propose a probability information assisted knowledge distillation (PIAKD) scheme that achieves intelligent joint monitoring for linear signal-to-noise ratio (SNRL) and nonlinear signal-to-noise ratio (SNRNL) in WDM systems. Under the condition of multi-task regression, outputs are independent and continuous, PIAKD addresses the longstanding challenge that the student model fails to effectively learn knowledge from the teacher model by introducing probability information into the loss function. The effectiveness of the scheme is verified by WDM simulation and experiment system which has a symbol rate of 28 GBaud per channel. The simulation results demonstrate that the overall mean absolute error (MAE) for jointing SNRL and SNRNL monitoring of the student model after PIAKD is reduced by 0.08 dB and 0.09 dB, corresponding to 32% and 34% error reductions respectively. Furthermore, when compared with the KD scheme without probability information, our scheme also reduces the overall MAE of SNRL and SNRNL by 16% and 11%, respectively. The results of the experiment reveal that the estimated MAE could be reduced by 0.13 dB and 0.16 dB, respectively, corresponding to error reductions of 17% and 18%. Moreover, the floating-point operations (FLOPs) and parameters (Params) of the student model is only 3.30 M and 0.0015 M, respectively, which is both significantly lower than the complexity of existing joint monitoring schemes for SNRL and SNRNL.