Children use information from both the auditory and visual modalities to aid in understanding speech. A dramatic illustration of this multisensory integration is the McGurk effect, an illusion in which an auditory syllable is perceived differently when it is paired with an incongruent mouth movement. However, there are significant interindividual differences in McGurk perception: some children never perceive the illusion, while others always do. Because converging evidence suggests that the posterior superior temporal sulcus (STS) is a critical site for multisensory integration, we hypothesized that activity within the STS would predict susceptibility to the McGurk effect. To test this idea, we used blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) in seventeen children aged 6 to 12 years to measure brain responses to three audiovisual stimulus categories: McGurk incongruent, non-McGurk incongruent and congruent syllables. Two separate analysis approaches, one using independent functional localizers and another using whole-brain voxel-based regression, showed differences in the left STS between perceivers and non-perceivers. The STS of McGurk perceivers responded significantly more than non-perceivers to McGurk syllables, but not to other stimuli, and perceivers’ hemodynamic responses in the STS were significantly prolonged. In addition to the STS, weaker differences between perceivers and non-perceivers were observed in the FFA and extrastriate visual cortex. These results suggest that the STS is an important source of interindividual variability in children’s audiovisual speech perception.