Abstract-This paper describes an original approach for content-based video indexing and retrieval. We aim at providing a global interpretation of the dynamic content of video shots without any prior motion segmentation and without any use of dense optic flow fields. To this end, we exploit the spatio-temporal distribution, within a shot, of appropriate local motion-related measurements derived from the spatio-temporal derivatives of the intensity function. These distributions are then represented by causal Gibbs models. To be independent of camera movement, the motion-related measurements are computed in the image sequence generated by compensating the estimated dominant image motion in the original sequence. The statistical modeling framework considered makes the exact computation of the conditional likelihood of a video shot belonging to a given motion or more generally to an activity class feasible. This property allows us to develop a general statistical framework for video indexing and retrieval with query-by-example. We build a hierarchical structure of the processed video database according to motion content similarity. This results in a binary tree where each node is associated to an estimated causal Gibbs model. We consider a similarity measure inspired from Kullback-Leibler divergence. Then, retrieval with query-by-example is performed through this binary tree using the maximum a posteriori (MAP) criterion. We have obtained promising results on a set of various real image sequences.