Understanding a social event requires assigning the participating entities to roles such as agent and patient, a mental operation that is reportedly effortless. We investigated whether, in processing visual scenes, role assignment is accomplished automatically (i.e., when the task does not require it), based on visuo-spatial information alone. Participants (male and female human adults) saw a series of images featuring the same male and female actors next to each other, one in an agent-like (more dynamic, leaning forward) and the other in a patient-like (static/less dynamic) posture. They had to indicate the side (left/right) of a target-actor (i.e., the female). From trial to trial, body postures changed, but the roles, defined by the type of posture, sometimes changed, sometimes not. If participants spontaneously saw the actors as agent and patient, they should be slower to respond when roles switched from trial n-1 to trial n, than when they stayed the same (role-switch cost). Results confirmed this hypothesis (Experiments 1-3). A role-switch cost was also found when roles were defined by another visual relational cue, the relative positioning (where one actor stands relative to another), but not when actors were presented in isolation (Experiments 4-6). These findings reveal a mechanism for automatic role assignment based on encoding of visual relational information in social (multiple-person) scenes. Since we found that role assignment in one trial affected the same process in the subsequent trial, this mechanism must be one that assigns entities in a scene, to the abstract categories of agent and patient.