Maintaining optimal temperatures in the critical parts of an induction traction motor is crucial for railway propulsion systems. A reduced-order lumped-parameter thermal network (LPTN) model enables computably inexpensive, accurate temperature estimation; however, it requires empirically based parameter estimation exercises. The calibration process is typically performed in labs in a controlled experimental setting, which is associated with a lot of supervised human efforts. However, the exploration of machine learning (ML) techniques in varied domains has enabled the model parameterization in the drive system outside the laboratory settings. This paper presents an innovative use of a multi-agent reinforcement learning (MARL) approach for the parametrization of an LPTN model. First, a set of reinforcement learning agents are trained to estimate the optimized thermal parameters using the simulated data in several driving cycles (DCs). The selection of a reinforcement learning agent and the level of neurons in the RL model is made based on variability of the driving cycle data. Furthermore, transfer learning is performed on a new driving cycle data collected on the measurement setup. Statistical analysis and clustering techniques are proposed for the selection of an RL agent that has been pre-trained on the historical data. It is established that by synergizing within reinforcement learning techniques, it is possible to refine and adjust the RL learning models to effectively capture the complexities of thermal dynamics. The proposed MARL framework shows its capability to accurately reflect the motor’s thermal behavior under various driving conditions. The transfer learning usage in the proposed approach could yield significant improvement in the accuracy of temperature prediction in the new driving cycles data. This approach is proposed with the aim of developing more adaptive and efficient thermal management strategies for railway propulsion systems.