This paper aimed to explore whether human beings can understand gestures produced by telepresence robots. If it were the case, they can derive meaning conveyed in telerobotic gestures when processing spatial information. We conducted two experiments over Skype in the present study. Participants were presented with a robotic interface that had arms, which were teleoperated by an experimenter. The robot could point to virtual locations that represented certain entities. In Experiment 1, the experimenter described spatial locations of fictitious objects sequentially in two conditions: speech condition (SO, verbal descriptions clearly indicated the spatial layout) and speech and gesture condition (SR, verbal descriptions were ambiguous but accompanied by robotic pointing gestures). Participants were then asked to recall the objects' spatial locations. We found that the number of spatial locations recalled in the SR condition was on par with that in the SO condition, suggesting that telerobotic pointing gestures compensated ambiguous speech during the process of spatial information. In Experiment 2, the experimenter described spatial locations nonsequentially in the SR and SO conditions. Surprisingly, the number of spatial locations recalled in the SR condition was even higher than that in the SO condition, suggesting that telerobotic pointing gestures were more powerful than speech in conveying spatial information when information was presented in an unpredictable order. The findings provide evidence that human beings are able to comprehend telerobotic gestures, and importantly, integrate these gestures with co-occurring speech. This work promotes engaging remote collaboration among humans through a robot intermediary.