Upon hearing someone’s speech, a listener can access information such as the speaker’s age, gender identity, socioeconomic status, and their linguistic background. However, an open question is whether living in different locales modulates how listeners use these factors to assess speakers’ speech. Here, an audio-visual test was used to measure whether listeners’ accentedness judgments and intelligibility (i.e., speech perception) can be modulated depending on racial information in faces that they see. American, British, and Indian English were used as three different English varieties of speech. These speech samples were presented with either a white female face or a South Asian female face. Two experiments were completed in two locales: Gainesville, Florida (USA) and Montreal, Quebec (Canada). Overall, Montreal listeners were more accurate in their transcription of sentences (i.e., intelligibility) compared to Gainesville listeners. Moreover, Gainesville listeners’ ability to transcribe the same spoken sentences decreased for all varieties when listening to speech paired with South Asian faces. However, seeing a white or a South Asian face did not impact speech intelligibility for the same spoken sentences for Montreal listeners. Finally, listeners’ accentedness judgments increased for American English and Indian English when the visual information changed from a white face to a South Asian face in Gainesville, but not in Montreal. These findings suggest that visual cues for race impact speech perception to a greater degree in locales with greater ecological diversity.