Intermodal timing relations and audio-visual speech recognition by normal-hearing adults

McGrath, Matthew; Summerfield, Quentin

doi:10.1121/1.392336

Cited by 165 publications

(142 citation statements)

References 0 publications

Supporting

Mentioning

127

Contrasting

Order By: Relevance

“…Like previous works have shown [21,26,42], asynchrony can be detected at very short thresholds, likely depending on the nature of the audiovisual event. Our observations indicate that such caution would involve the avoidance of video stream delays entirely, since the windows of temporal integration for two of three content types do not extend to audio lead asynchrony, irrespective of distortion.…”

Section: Resultsmentioning

confidence: 97%

“…While some experiments implement a temporal order judgement task, asking participants to determine which signal precedes the other [39], others rely on a simultaneity judgement task that requires assessments on perceived synchrony [5]. Still others ask for the detection of gradually introduced asynchrony [8], or discrimination between presentations with different temporal offsets [26]. Figure 1 provides a summary of previously published thresholds corresponding to perceived synchrony for different event types, but also established using different measures.…”

Section: Temporal Integration and Quality Distortionmentioning

confidence: 99%

“…1 Overview of previously published thresholds of perceived synchrony, illustrating the different measures and event types applied to the study of temporal integration. Where some ask participants to indicate the point at which an auditory and a visual stream are no longer perceived as synchronous [8,14,21,26], others use simultaneity judgements and curve fittings to calculate the full-width at half-maximum (FWHM) [5] or the 95 % confidence interval [42]. Others still use temporal order judgements and calculate the slopes of cumulative distribution functions [11] With new multimedia platforms come new approaches to efficient streaming, and with them come old concerns.…”

Section: Temporal Integration and Quality Distortionmentioning

confidence: 99%

See 2 more Smart Citations

Audiovisual robustness: exploring perceptual tolerance to asynchrony and quality distortion

Griwodz

Halvorsen

et al. 2014

Multimed Tools Appl

View full text Add to dashboard Cite

Rules-of-thumb for noticeable and detrimental asynchrony between audio and video streams have long since been established from the contributions of several studies. Although these studies share similar findings, none have made any discernible assumptions regarding audio and video quality. Considering the use of active adaptation in present and upcoming streaming systems, audio and video will continue to be delivered in separate streams; consequently, the assumption that the rules-of-thumb hold independent of quality needs to be challenged. To put this assumption to the test, we focus on the detection, not the appraisal, of asynchrony at different levels of distortion. Cognitive psychologists use the term temporal integration to describe the failure to detect asynchrony. The term refers to a perceptual process with an inherent buffer for short asynchronies, where corresponding auditory and visual signals are merged into one experience. Accordingly, this paper discusses relevant causes and concerns with regards to asynchrony, it introduces research on audiovisual perception, and it moves on to explore the impact of audio and video quality on the temporal integration of different audiovisual events. Three content types are explored, speech from a news broadcast, music presented by a drummer, and physical action in the Multimed Tools Appl (2015) 74:345-365 form of a chess game. Within these contexts, we found temporal integration to be very robust to quality discrepancies between the two modalities. In fact, asynchrony detection thresholds varied considerably more between the different content than they did between distortion levels. Nevertheless, our findings indicate that the assumption concerning the independence of asynchrony and audiovisual quality may have to be reconsidered.

show abstract

Section: Resultsmentioning

confidence: 97%

Section: Temporal Integration and Quality Distortionmentioning

confidence: 99%

Section: Temporal Integration and Quality Distortionmentioning

confidence: 99%

See 1 more Smart Citation

Audiovisual robustness: exploring perceptual tolerance to asynchrony and quality distortion

Griwodz

Halvorsen

et al. 2014

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…Thus, the video token of the original tape and an edited audio token were recorded simultaneously onto a second tape, resulting in a new synchronous audiovisual token. The lag time for dubbing was found to be no greater than 9.4 msec, well below the 80-msec range required for observers to detect an audiovisual asynchrony (McGrath & Summerfield, 1985).…”

Section: Methodsmentioning

confidence: 99%

Visual influences on auditory pluck and bow judgments

Saldaña

Rosenblum

1993

Perception & Psychophysics

View full text Add to dashboard Cite

In the McGurk effect, visual information specifying a speaker's articulatory movements can influence auditory judgments of speech. In the present study, we attempted to find an analogue of the McGurk effect by using nonspeech stimuli-the discrepant audiovisual tokens of plucks and bows on a cello. The results of an initial experiment revealed that subjects' auditory judgments were influenced significantly by the visual pluck and bow stimuli. However, a second experiment in which speech syllables were used demonstrated that the visual influence on consonants was significantly greater than the visual influence observed for pluck-bow stimuli. This result could be interpreted to suggest that the nonspeech visual influence was not a true McGurk effect. In a third experiment, visual stimuli consisting of the words pluck and bow were found to have no influence over auditory pluck and bow judgments. This result could suggest that the nonspeech effects found in Experiment 1 were based on the audio and visual information's having an ostensive lawful relation to the specified event. These results are discussed in terms of motor-theory, ecological, and FLMP approaches to speech perception.

show abstract

“…In the McGurk effect, a lipread speech token affects the phonetic content of a speech sound that is heard (McGurk & MacDonald, 1976). The effect was biggest when auditory vowels were synchronized with the original mouth movements (McGrath & Summerfield, 1985), but the effect survives, even if audition lagged vision by 180 msec (see also Soto-Faraco & Alsius, 2007; these studies show that participants can still perceive a McGurk effect when they can quite reliably perform TOJs at intervals above the JND). So, speech-sound identification can be influenced by lipread speech even if the two are perceived as being out of sync.…”

Section: The Window Of Temporal Integrationmentioning

confidence: 98%