Recent work indicates that the central nervous system assesses the causality of visual and inertial information in the estimation of qualitative characteristics of self-motion and spatial orientation, and forms multisensory perceptions in accordance with the outcome of these assessments.Here, we extend the assessment of this Causal Inference (CI) strategy to the quantitative domain of traveled distance. We present a formal model of how stimuli result in sensory estimates, how percepts are constructed from sensory estimates, and how responses result from percepts. Starting with this formalization, we derived probabilistic formulations of CI and competing models for perception of traveled distance. In an experiment, participants (n=9) were seated in the Max Planck Cablerobot Simulator, and shown a photo-realistic virtual rendering of the simulator hall via a Head-Mounted Display. Using this setup, the participants were presented with various unisensory and (incongruent) multisensory visual-inertial horizontal linear surge motions, differing only in amplitude (i.e., traveled distance). Participants performed both a Magnitude Estimation and a Two-Interval Forced Choice task. Overall, model comparisons favor the CI model, but individual analysis shows a Cue Capture strategy is preferred in most individual cases. Parameter estimates indicate that visual and inertial sensory estimates follow a Stevens' power law with positive exponent, and that noise increases with physical distance in accordance with a Weber's law. Responses were found to be biased towards the mean stimulus distance, consistent with an interaction between percepts and prior knowledge in the formulation of responses. Magnitude estimate data further showed a regression to the mean effect. The experimental data did not provide unambiguous support for the CI model. However, model derivations and fit results demonstrate it can reproduce empirical findings, arguing in favor of the CI model. Moreover, the methods outlined in the present study demonstrate how different sources of distortion in responses may be disentangled by combining psychophysical tasks.