The ability to detect and understand other people’s social interactions is a fundamental part of the human visual experience that develops early in infancy and is shared with other primates. However, the neural computations underlying this ability remain largely unknown. Is the detection of social interactions a rapid perceptual process, or a slower post-perceptual inference? Here we used magnetoencephalography (MEG) decoding and computational modeling to ask whether social interactions can be detected via fast, feedforward processing. Subjects in the MEG viewed snapshots of visually matched real-world scenes containing a pair of people who were either engaged in a social interaction or acting independently. The presence versus absence of a social interaction could be read out from subjects’ MEG data spontaneously, even while subjects performed an orthogonal task. This readout generalized across different scenes, revealing abstract representations of social interactions in the human brain. These representations, however, did not come online until quite late, at 300 ms after image onset, well after the time period of feedforward visual processes. In a second experiment, we found that social interaction readout occurred at this same latency even when subjects performed an explicit task detecting social interactions. Consistent with these latency results, a standard feedforward deep neural network did not contain an abstract representation of social interactions at any model layer. We further showed that MEG responses distinguished between different types of social interactions (mutual gaze vs joint attention) even later, around 500 ms after image onset. Taken together, these results suggest that the human brain spontaneously extracts the presence and type of others’ social interactions, but does so slowly, likely relying on iterative top-down computations.