The Virtual Track Train (VTT) represents an innovative urban public transportation system that combines tire-based running gears with rail transit management. Effective control of such a system necessitates precise state estimation, a task rendered complex by the multi-articulated nature of the vehicles. This study addresses the challenge by focusing on state estimation for the first unit under significant interference, introducing a fusion state estimation strategy utilizing Gaussian Process Regression (GPR) and Interacting Multiple Model (IMM) techniques. First, a joint model for the first unit is established, comprising the dynamics model as the main model and a residual model constructed based on GPR to accommodate the main model’s error. The proposed fusion strategy comprises two components: a kinematic model-based method for handling transient and high-acceleration phases, and a joint-model-based method suitable for near-steady-state and low-acceleration conditions. The IMM method is employed to integrate these two approaches. Subsequent units’ states are computed from the first unit’s state, articulation angles, and yaw rates’ filtered data. Validation through hardware-in-the-loop (HIL) simulation demonstrates the strategy’s efficacy, achieving high accuracy with an average lateral speed estimation error below 0.02 m/s and a maximum error not exceeding 0.22 m/s. Additionally, the impact on VTT control performance after incorporating state estimation is minimal, with a reduction of only 3–6%.