“…Face, voice, and social scene processing in monkeys have been individually explored, to some extent, from the behavioural (Gothard et al, 2004(Gothard et al, , 2009Rendall et al, 1996;Sliwa et al, 2011) and the neuronal point of view (Aparicio et al, 2016;Arcaro et al, 2017;Cohen et al, 2007;Eifuku, 2014;Gil-da-Costa et al, 2004, 2006Hesse & Tsao, 2020;Issa & DiCarlo, 2012;Joly, Pallier, et al, 2012;Moeller et al, 2008;Ortiz-Rios et al, 2015;Petkov et al, 2008;Pinsk et al, 2005Pinsk et al, , 2009Poremba et al, 2003Poremba et al, , 2004Romanski et al, 2005;Russ et al, 2008;Schwiedrzik et al, 2015;Sliwa & Freiwald, 2017;Tsao et al, 2003). Audiovisual integration during naturalistic social stimuli has recently been shown in speci c regions of the monkey face-patch system (Khandhadia et al, 2021), the voice-patch system (Ghazanfar, 2009;Ghazanfar et al, 2005;Perrodin et al, 2014Perrodin et al, , 2015, as well as in the prefrontal voice area (Romanski, 2012). However, beyond combining sensory information, social perception also involves integrating contextual, behavioural and emotional information (Freiwald, 2020;Ghazanfar & Santos, 2004).…”