Human communication entails an efficient way of simultaneously processing voice and reducing the impact of environmental noise. By manipulating background noise, we aimed at clarifying the neural mechanisms allowing voice comprehension in noisy situations. Our results point to spatial and temporal coexistence of lateral and medial temporal cortex networks when voice is easily detected in highly noisy conditions, revealing the necessary neural underpinnings of human communication in realistic situations.