The role of perceptual organization in motion analysis has heretofore been minimal. In this work we present a simple but powerful computational model and associated algorithms based on the use of perceptual organizational principles, such as temporal coherence (or common fate) and spatial proximity, for motion segmentation. The computational model does not use the traditional frame by frame motion analysis; rather it treats an image sequence as a single 3D spatio-temporal volume. It endeavors to find organizations in this volume of data over three levels-signal, primitive, and structural. The signal level is concerned with detecting individual image pixels that are probably part of a moving object. The primitive level groups these individual pixels into planar patches, which we call the temporal envelopes. Compositions of these temporal envelopes describe the spatio-temporal surfaces that result from object motion. At the structural level, we detect these compositions of temporal envelopes by utilizing the structure and organization among them. The algorithms employed to realize the computational model include 3D edge detection, Hough transformation, and graph based methods to group the temporal envelopes based on Gestalt principles. The significance of the Gestalt relationships between any two temporal envelopes is expressed in probabilistic terms. One of the attractive features of the adopted algorithm is that it does not require the detection of special 2D features or the tracking of these features across frames. We demonstrate that even with simple grouping strategies, we can easily handle drastic illumination changes, occlusion events, and multiple moving objects, without the use of training and specific object or illumination models. We present results on a large variety of motion sequences to demonstrate this robustness. c 2002 Elsevier Science (USA)