This paper proposes a novel approach to assess audiovisual integration for both congruent and incongruent speech stimuli using reaction times (RT). The experiments are based on the McGurk effect, in which a listener is presented with incongruent audiovisual speech signals. A typical example involves the auditory consonant/b/combined with a visually articulated/g/, often yielding a perception of/d/. We quantify the amount of integration relative to the predictions of a parallel independent model as a function of attention and congruency between auditory and visual signals. We assessed RT distributions for congruent and incongruent auditory and visual signals in a within-subjects signal detection paradigm under conditions of divided versus focused attention. Results showed that listeners often received only minimal benefit from congruent auditory visual stimuli, even when such information could have improved performance. Incongruent stimuli adversely affected performance in divided and focused attention conditions. Our findings support a parallel model of auditory-visual integration with interactions between auditory and visual channels.Keywords Speech perception . Attention: Selective .
Reaction time methodsWhat cognitive mechanisms underlie speech recognition when audition is supplemented with visual information? The modern era of research into how auditory and visual speech cues interact began with Sumby and Pollack's (1954) seminal experimental work on audiovisual (AV) enhancement: They showed that visual cues provided by a talker's lip-movements facilitate auditory recognition across a range of signal-to-noise ratios. However, Massaro (1987a) proved that this outcome did not necessarily demonstrate integration since a singlechannel model could theoretically predict the results.In a critical study two decades later, McGurk and Macdonald (1976) reported a dramatic perceptual integration phenomenon that resulted from the presentation of incongruent auditory-visual speech signals. In what became known as the "McGurk effect," presentation of the auditory consonant/ b/over a visually articulated/g/yielded a fused percept of/d/. Audiovisual fusions such as these occur when the perceptual system maps cues from conflicting signals onto a phonemic category distinct from either input signal.1 Thus, the McGurk effect is a prime candidate with which to probe the mechanisms underlying integration.Several studies of the McGurk effect have been carried out, with the majority using mean accuracy as the dependent variable. In these studies, performance in auditory and visualonly trials is compared to accuracy in audiovisual trials, usually via confusion matrices (e.g., Massaro, 1987aMassaro, , 1998Massaro, , 2004. These experimental designs and modeling efforts have shed considerable light on speech integration: Auditory and visual cues appear to interact in a multiplicative manner 1 This cannot be demonstrated conclusively without also measuring responses to the single modality presentations (cf. Massaro 1987a(cf. M...