Abstract-Joint attention, i.e., the behavior of looking at the same object that another person is looking at, plays an important role in human and human-robot communication. Previous synthetic studies focusing on modeling the early developmental process of joint attention have proposed learning methods without explicit instructions for joint attention. In these studies, the causal structure between a perception variable (a caregiver's face direction or an individual object) and an action variable (gaze shift to a caregiver's face or to an object location) was given in advance to learn joint attention. However, such a structure is expected to be found by the robot through interaction experiences. In this paper, we investigates how transfer entropy, an information theory measure, is used to quantify the causality inherent in face-to-face interaction. In computer simulations of human-robot interaction, we examine which pair of perceptions and actions is selected as the causal pair and show that the selected pairs can be used for learning a sensorimotor map for joint attention.