The study advances Crossmodal Alignment Framework to explore multimodal discourse in its three formats – semiotic, communicative and perceptive – via multimodal experiment. It considers the alignment patterns obtained from two semiotic modes (text and image), transferred in two communicative modes (speech and gesture), sensed by two perception modes (visual and audial). The common research framework determines the patterns as modulated by discourse tasks. The study features the results of multimodal experiments with the participants engaged in three discourse tasks: 1) receptive, which presumes obtaining information from text and image stimuli; semiotic alignment patterns are identified indirectly via participants’ gaze response; 2) productive, in which the participants communicate the information in monological format; communicative alignment patterns are identified directly via their speech and gesture; 3) receptive-productive, which presupposes the participants perceive information visually and audially; alignment patterns are identified directly via participants’ gaze behavior contingent on the stimuli areas of interest and indirectly via their speech response. Data analysis allows to determine and scale the degree of crossmodal alignment to discourse tasks, which helps identify the input of each mode to solving these tasks. The research framework and obtained results contribute to further development of multimodal discourse methods.