1995
DOI: 10.1007/bf00849043
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of models for fusion of the auditory and visual sensors in speech perception

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
9
0

Year Published

1997
1997
2015
2015

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(9 citation statements)
references
References 56 publications
0
9
0
Order By: Relevance
“…Studies of cross-modal integration have focused both on the effect of congruency, when the audio and video stimuli are taken from the same or different utterances (McGurk and MacDonald 1976; Summerfield and McGrath 1984), the same or different speakers (Kamachi et al 2003), and also on the role of temporal synchrony, when the audio and video stimuli are congruent but temporally misaligned (Dixon and Spitz 1980; McGrath and Summerfield 1985). All of these studies have found consistent improvements in perception when visual information is used to supplement auditory information, even when the information in the video signal is imperfect or even partially inconsistent with the audio signal (Summerfield 1987; Robert-Ribes et al 1995). …”
mentioning
confidence: 94%
“…Studies of cross-modal integration have focused both on the effect of congruency, when the audio and video stimuli are taken from the same or different utterances (McGurk and MacDonald 1976; Summerfield and McGrath 1984), the same or different speakers (Kamachi et al 2003), and also on the role of temporal synchrony, when the audio and video stimuli are congruent but temporally misaligned (Dixon and Spitz 1980; McGrath and Summerfield 1985). All of these studies have found consistent improvements in perception when visual information is used to supplement auditory information, even when the information in the video signal is imperfect or even partially inconsistent with the audio signal (Summerfield 1987; Robert-Ribes et al 1995). …”
mentioning
confidence: 94%
“…Findings from other areas of research have suggested the existence of a mechanism or representation common to the processing of speech input from the auditory and visual modalities (Campbell, 1987;Watson, Qiu, Chamberlain, & Li, 1996). Cross-modal interaction at some level has demonstrated that information from different sensory modalities can be combined in perception, as in the McGurk effect (e.g., McGurk & MacDonald, 1976), and that input to one modality can influence processing in another (see, e.g., Robert-Ribes, Schwartz, & Escudier, 1995, for a review). Using magnetoencephalographic recordings, visual input specifically from lip movements was found to influence auditory cortical activity (Sams, Aulanko, Hämäläinen, Hari, Lounasmaa, Lu, & Simola, 1991).…”
mentioning
confidence: 99%
“…As reported in [29], both vocal intonations and facial expressions determine the listener's affective state in up to 93% of cases. Recently, increased attention has been paid to analyzing multimodal information in emotion recognition (e.g., [1,7,[9][10][11][12][13][30][31][32][33][34]). However, most of them still use deliberate and often exaggerated facial displays (e.g., [2,5]).…”
Section: Introductionmentioning
confidence: 99%