Conventional remote sensing data analysis techniques have a significant bottleneck of operating on a selectively chosen small-scale dataset. Availability of an enormous volume of data demands handling large-scale, diverse data, which have been made possible with neural network-based architectures. This work exploits the contextual information capturing ability of deep neural networks (DNNs), particularly investigating multispectral band properties from Sentinel-2 image patches. Besides, an increase in the spatial resolution often leads to non-linear mixing of land-cover types within a target resolution cell. We recognize this fact and group the bands according to their spatial resolutions, and propose a classification and retrieval framework. We design a representation learning framework for classifying the multi-spectral data by first utilizing all the bands and then using the grouped bands according to their spatial resolutions. We also propose a novel triplet-loss function for multi-labeled images and use it to design an inter-band group retrieval framework. We demonstrate its effectiveness over the conventional tripletloss function. Finally, we present a comprehensive discussion of the obtained results. We thoroughly analyze the performance of the band groups on various land-cover and land-use areas from agro-forestry regions, water bodies, and human-made structures. Experimental results for the classification and retrieval framework on the benchmarked BigEarthNet dataset exhibit marked improvements over existing studies.