This study investigates the relative weighting of morphosyntactic and visual cues in spoken-language comprehension, and whether this varies systematically within and between first (L1) and second language (L2) speakers of German. In two experiments, 45 L1 and 39 L2 speakers answered probe questions targeting the action direction of subject- and object-extracted relative clauses, which were presented either in isolation (Experiment 1) or alongside scene depictions either matching or mismatching the action direction expressed in the sentence (Experiment 2). We hypothesized that visual cues contribute to shaping meaning representations in sentence comprehension, and that sensitivity to morphosyntactic cues during interpretation may predict reliance on visual cues in both L1 and L2 comprehension. We found reliable effects of visual cues in both groups, and in response to both relative-clause types. Further, proxies of morphosyntactic sensitivity were associated with higher agent-identification accuracy, especially in response to object-extracted relative clauses presented with mismatching visual cues. Lastly, morphosyntactic sensitivity was a better predictor of accuracy rates than L1–L2 grouping in our dataset. The results extend the generalizability of models of visuo-linguistic integration across populations and experimental settings. Further, the observed sentence-comprehension differences can be explained in terms of individual cue-weighting patterns, and thus point to the crucial role of sensitivity to distinct cue types in accounting for thematic-role assignment success in L1 and L2 speakers alike.