This paper presents a method learning mixed templates for view invariant object recognition. The template is composed of 3D and 2D primitives which are stick-like elements defined in 3D and 2D spaces respectively. The primitives are allowed to perturb within a local range to account for instance variations of an object category. When projected onto images, the appearance of these primitives are represented by Gabor filters. Both 3D and 2D primitives have parameters describing their visible range in a viewing hemisphere. Our algorithm sequentially selects primitives and builds a probabilistic model using the selected primitives. The order of this sequential selection is decided by the information gains of primitives, which can be estimated together with the visible range parameter efficiently. In experiments, we evaluate performance of the learned 3D templates on car recognition and pose estimation. We also show that the algorithm can learn intuitive mixed templates on various object categories, which suggests that our method could be used as a numerical method to justify the debate over viewer-centered and object-centered representations.