The enactive theory of perception hypothesizes that perceptual access to objects depends on the mastery of sensorimotor contingencies, that is, on the know-how of the regular ways in which changes in sensations depend on changes in movements. This hypothesis can be extended into the social domain: perception of other minds is constituted by mastery of self-other contingencies, that is, by the know-how of the regular ways in which changes in others' movements depend on changes in one's movements. We investigated this proposal using the perceptual crossing paradigm, in which pairs of players are required to locate each other in an invisible one-dimensional virtual space by using a minimal haptic interface. We recorded and analyzed the real-time embodied social interaction of 10 pairs of adult participants. The results reveal a process of implicit perceptual learning: on average, clarity of perceiving the other's presence increased over trials and then stabilized. However, a clearer perception of the other was not associated with correctness of recognition as such, but with both players correctly recognizing each other. Furthermore, the moments of correct mutual recognition tended to happen within seconds. The fact that changes in social experience can only be explained by the successful performance at the level of the dyad, and that this veridical mutual perception tends toward synchronization, lead us to hypothesize that integration of neural activity across both players played a role.