The train dynamics modeling problem is a challenging task due to the complex dynamic characteristics and complicated operating environment. The flexible formations, the heavy carriage load, and the nonlinear feature of air braking further increase the difficulty of modeling the dynamics of heavy haul trains. In this study, a novel data-driven train dynamics modeling method is designed by combining the attention mechanism (AM) with the gated recursive unit (GRU) neural network. The proposed learning network consists of the coding, decoding, attention, and context layers to capture the relationship between the train states with the control command, the line condition, and other influencing factors. To solve the data insufficiency problem for new types of heavy haul trains to be deployed, the model agnostic meta-learning (MAML) framework is adopted to achieve knowledge transferring from tasks supported by large amounts of field data to data-insufficient tasks. Effective knowledge transfer can enhance the efficiency of data resource utilization, reduce data requirements, and lower computational costs, demonstrating considerable potential in the application of sustainable development. The simulation results validate the effectiveness of the proposed MAML-based method in enhancing accuracy.