Background
In natural communications, messages can be expressed through multimodalities and processed predictively. As extra-linguistic information, gesture has frequently been observed to accompany language. Question arises as how the spatial-motoric gesture from auditory interact with the visually linear-analytic speech. Adopting a cross-modal fragment priming paradigm, two experiments were conducted to investigate cortical involvement of gesture-speech integration as lexical representation of speech is proportionally activated and primed by gesture. Proportional presentation of gesture and speech were realized by segmenting gesture and speech fragments into five lengths relative to the gesture discrimination point (DP) and speech identification point (IP), i.e., 0.5, 0.75, 1, 1.25, and 1.5 DP/IP. Experiment 1 quantitively depicted the informativeness of the five lengths of gesture and speech fragments with nameability. Experiment 2 aimed to track the neural processes with event-related potentials (ERPs) when three of the five levels of fragments were chosen and named before, at and after gesture_DP/speech_IP.
Results
Experiment 1 (N = 60) revealed proportional lexical activation relative to the processing time, given the positive correlations between gesture/speech lengths and nameabilities. In Experiment 2 (N = 35), three ERP components were detected to be influenced by distinct gesture-speech information processing stages: an N1 component (0–100 ms of speech onset) modulated solely by gestures; the N400 effect (300-500 ms) found in before_DP/before_IP, DP/IP, after_DP/IP and after_DP/after_IP conditions and inversely correlated to the amount of presented information; followed immediately by the late positive component (LPC) (500-800 ms).
Conclusion
Our results suggest proportionally activated representation of gesture and speech that is inherently distributionally graded. Furthermore, our results uncovered dynamic neural stages when the lexical representations are progressively activated during gesture-speech integration. Thus, this study provides new insights into multimodal semantic processing.