The paper aims to study speaker and listener behavior in dyadic speech communication. A multimodal (speech and video) corpus of dyadic face-to-face conversations on various topics was created. The corpus was manually labeled on several layers (text transcription, backchannel modality and function, POS tags, prosody, and gaze). The statistical analysis was done on the proposed corpus. We focused on backchannel inviting cues on the speaker side and backchannels on the listener side and their patterns. We aimed to study interlocutor backchannel behavior and backchannel-related signals. The results of the analysis show similar patterns in the case of backchannel inviting cues between Slovak and English data and highlight the importance of gaze direction in a face-to-face speech communication scenario. The described corpus and results of the analysis are one of the first steps leading towards natural artificial intelligence-driven human–computer speech conversation.