It is a challenge to develop methods which can process the PolSAR and multispectral (MS) data modalities together without losing information from either for remote sensing applications. This paper presents a study which attempts to introduce novel deep learning based remote sensing data processing frameworks that utilizes convolutional neural networks (CNNs) in both spatial and spectral domains to perform land cover (LC) classification with PolSAR-MS data. Also since earth observation remotely sensed data have usually larger spectral depth than normal camera image data, exploiting the spectral information in remote sensing (RS) data is crucial as well. In fact, convolutions in the subspectral space are intuitive and alternative to the process of feature selection. Recently, researchers have gained success in exploiting the spectral information of RS data, especially the hyperspectral data with CNNs. In this paper, exploitation of the spectral information in the PolSAR-MS data via a permuted localized spectral convolution along with localized spatial convolution is proposed. Further, the study in this paper also establishes the significance of performing permuted localized spectral convolutions over non-localized or localized spectral convolutions. Two models are proposed, namely a permuted local spectral convolutional network (Perm-LS-CNN) and a permuted local spectral-spatial convolutional network (Perm-LSS-CNN). These models are trained on ground truth class data points measured directly on the terrain. The evaluation of the generalization performance is done using ground truth knowledge on selected well known regions in the study areas. Comparison with other popular machine learning classifiers shows that the Perm-LSS-CNN model provides better classification results in terms of both accuracy and generalization.