Upon hearing someone’s speech, a listener can make inferences regarding the speakers’ age, gender identity, socioeconomic status, and their linguistic background. However, it is not clear to what extent listeners use these factors to decode the speech signal itself. Here, we use an audio-visual task to measure whether listeners’ accentedness and intelligibility judgments (i.e., speech perception) change as a function of the racial information that they see on a computer screen. American, British, and Indian English were used as three different English varieties and presented with either a White female face or a South Asian female face. Results showed that listeners’ ability to transcribe sentences (i.e., intelligibility) decreased and their accentedness judgments increased for all varieties when speech was paired with South Asian faces. However, this increase was modulated by participants’ social network diversity. In short, the racial diversity in people’s social network impacts accentedness judgments but not speech perception.