In this paper, we propose a novel method for projecting data from multiple modalities to a new subspace optimized for one-class classification. The proposed method iteratively transforms the data from the original feature space of each modality to a new common feature space along with finding a joint compact description of data coming from all the modalities. For data in each modality, we define a separate transformation to map the data from the corresponding feature space to the new optimized subspace by exploiting the available information from the class of interest only. The data description in the new subspace is obtained by Support Vector Data Description. We also propose different regularization strategies for the proposed method and provide both linear and non-linear formulation. We conduct experiments on two multimodal datasets and compare the proposed approach with baseline and recently proposed one-class classification methods combined with early fusion and also considering each modality separately. We show that the proposed Multimodal Subspace Support Vector Data Description outperforms all the methods using data from a single modality and performs better or equally well than the methods fusing data from all modalities.