Abstract. We present a novel framework for learning a joint shape and appearance model from a large set of un-labelled training examples in arbitrary positions and orientations. The shape and intensity spaces are unified by implicitly representing shapes as "images" in the space of distance transforms. A stochastic chord-based matching algorithm is developed to align photo-realistic training examples under a common reference frame. Then dense local deformation fields, represented using the cubic B-spline based Free Form Deformations (FFD), are recovered to register the training examples in both shape and intensity spaces. Principal Component Analysis (PCA) is applied on the FFD control lattices to capture the variations in shape as well as on registered object interior textures. We show examples where we have built coupled shape and appearance prior models for the left ventricle and whole heart in short-axis cardiac tagged MR images, and used them to delineate the heart chambers in noisy, cluttered images. We also show quantitative validation on the automatic segmentation results by comparing to expert solutions.