A basic problem of visual perception is how human beings recognize objects after spatial transformations. Three central classes of findings have to be accounted for: (a) Recognition performance varies systematically with orientation, size, and position; (b) recognition latencies are sequentially additive, suggesting analogue transformation processes; and (c) orientation and size congruency effects indicate that recognition involves the adjustment of a reference frame. All 3 classes of findings can be explained by a transformational framework of recognition: Recognition is achieved by an analogue transformation of a perceptual coordinate system that aligns memory and input representations. Coordinate transformations can be implemented neurocomputationally by gain (amplitude) modulation and may be regarded as a general processing principle of the visual cortex.Keywords: alignment, coordinate transformations, gain modulation, object recognition, reference frames How can we recognize objects regardless of spatial transformations such as plane and depth rotation, size scaling, and position changes? This ability is often discussed under the label object constancy or shape constancy. Even young children recognize objects so immediately and effortlessly that it seems to be a rather ordinary and simple task. However, changes in the spatial relation between observer and object lead to large changes of the image that is projected onto the retina. Hence, to recognize objects regardless of orientation, size, and position is not a trivial problem. No computational system proposed so far can successfully recognize objects over wide ranges of object categories and contexts.Several different approaches have been proposed over the years (for reviews, see Palmeri & Gauthier, 2004;Ullman, 1996). A number of models rely on abstract object representations, which predict that recognition performance is typically invariant regarding spatial transformations (e.g., structural description models; see Hummel & Biederman, 1992;Marr & Nishihara, 1978). In contrast, image-based or view-based models propose that object representations are close to the format of the perceptual input and therefore depend systematically on image transformations (e.g.,