This article considers tools to support remote gesture in video systems being used to complete collaborative physical tasks-tasks in which two or more individuals work together manipulating three-dimensional objects in the real world. We first discuss the process of conversational grounding during collaborative physical tasks, particularly the role of two types of gestures in the grounding process: pointing gestures, which are used to refer to task objects and locations, and rep- HUMAN-COMPUTER INTERACTION, 2004, Volume 19, pp. 273-309 Copyright © 2004 resentational gestures, which are used to represent the form of task objects and the nature of actions to be used with those objects. We then consider ways in which both pointing and representational gestures can be instantiated in systems for remote collaboration on physical tasks. We present the results of two studies that use a "surrogate" approach to remote gesture, in which images are intended to express the meaning of gestures through visible embodiments, rather than direct views of the hands. In Study 1, we compare performance with a cursor-based 274 FUSSELL ET AL.